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WO 99/50426 PCT/US99/06474 
VECTORS FOR GENE MUTAGENESIS AND GENE DISCOVERY 



1.0. FIELD OF THE INVENTION 

The present invention relates to recombinant vectors 
5 incorporating structural elements that, after the vectors 

have integrated into the host cell genome, enhance the number 
of cellular genes that can be identified as well as 
effectively mutated. The described vectors are important 
tools for both gene discovery, gene cloning, gene mutation, 
10 gene regulation, shuttling nucleic acid sequences throughout 
the genome, and gene activation and over expression. 

2.0. BACKGROUND OF THE INVENTION 

Gene trapping provides a powerful approach for 

15 simultaneously mutating and identifying genes. Gene trap 

vectors can be nonspecif ically inserted into the target cell 
genome, and gene trap vectors have consequently been 
constructed that select for events in which the gene trap 
vector has inserted into and mutated a gene. By exploiting 

20 the cellular splicing machinery, the selectable nature of 
these vectors removes the large background of insertion 
events where vectors have not integrated into genes. 

Most mammalian genes are divided into exons and introns . 
Exons are the portions of the gene that are spliced into mRNA 

25 and encode the protein product of a gene. In genomic DNA, 

these coding exons are divided by noncoding intron sequences. 
Although RNA polymerase transcribes both intron and exon 
sequences, the intron sequences must be removed from the 
transcript so that the resulting mRNA can be translated into 

30 protein. Accordingly, all mammalian, and most eukaryotic, 
cells have the machinery to splice exons into mRNA. Gene 
trap vectors have been designed to integrate into introns or 
genes in a manner that allows the cellular splicing machinery 
to splice vector encoded exons to cellular mRNAs . Often, 

35 such gene trap vectors contain selectable marker sequences 

that are preceded by strong splice acceptor sequences and are 



- 1 - 



WO 99/50426 



PCT/US99/06474 



not preceded by a promoter. Accordingly, when such vectors 
integrate into a gene, the cellular splicing machinery 
splices exons from the trapped gene onto the 5' end of the 
selectable marker sequence. Typically, such selectable 
5 marker genes can only be expressed if the vector encoding the 
gene has integrated into an intron. The resulting gene trap 
events are subsequently identified by selecting for cells 
that can survive selective culture. 

■ Gene trapping has proven to be a very efficient method 

10 of mutating large numbers of genes. The insertion of the 

gene trap vector creates a mutation in the trapped gene, and 
also provides a molecular tag that can be exploited to 
identify the trapped gene. When ROSAPgeo was used to trap 
genes it was demonstrated that at least 50% of the resulting 

15 mutations resulted in a phenotype when examined in mice. 
This indicates that the gene trap insertion vectors are 
useful mutagens. Although a powerful tool for mutating 
genes, the potential of the method had been limited by the 
difficulty in identifying the trapped genes. Methods that 

2 0 have been used to identify trap events rely on the fusion 

transcripts resulting from the splicing of exon sequences 
from the trapped gene to sequences encoded by the gene trap 
vector. Common gene identification protocols used to obtain 
sequences from these fusion transcripts include 5 1 RACE, cDNA 
25 cloning, and cloning of genomic DNA surrounding the site of 

vector integration. However, these methods have proven labor 
intensive, not readily amenable to automation, and generally 
impractical for high- throughput . 

30 

3.0. SUMMARY OF THE INVENTION 

Recently, vectors have been developed that rely on a new 
strategy of gene trapping that uses a vector that contains a 
selectable marker gene preceded by a promoter and followed by 

3 5 a splice donor sequence instead of a polyadenylation 

sequence. These vectors do not provide selection unless they 
integrate into a gene and subsequently trap downstream exons 
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that provide the polyadenylation sequence required for 
expression of the selectable marker. Integration of such 
vectors into the chromosome results in the splicing of the 
selectable marker gene to 3' exons of the trapped gene. 
5 These vectors provide a number of advantages. They can be 
used to trap genes regardless of whether the genes are 
normally expressed in the cell type in which the vector has 
integrated. In addition, cells harboring such vectors can be 
screened using automated (e.g., 96-well plate format) gene 

10 identification assays such as 3' RACE (see generally, 

Frohman, 1994, PCR Methods and Applications, 4:S40-S58). 

Using these vectors it is possible to produce large numbers 
of mutations and rapidly identify the mutated, or trapped, 
gene. Honever, prior to the present invention, the 

15 commercial scale exploitation of such vectors has been 

limited by the number of target genes that can be efficiently 
trapped using such vectors. 

The relative inefficiency of first generation 3' gene 
trap vectors has limited the total number of genes that can 

20 be rapidly and practically trapped, identified, analyzed, and 
effectively mutated. This inefficiency prompted the 
development of more efficient methods of 3 r gene trapping- - 
methods that allow a greater percentage of genes in the 
target cell genome to be trapped and rapidly identified by, 

25 for example, DNA sequence analysis. 

The present invention relates to the construction of 
novel vectors comprising a 3 1 gene trap cassette that allows 
for high efficiency 3 1 gene trapping. The presently 
described 3 1 gene trap cassette comprises in operable 

30 combination, a promoter region, an exon (typically 

characterized by a translation initiation codon and open 
reading frame and/or internal ribosome entry site) , a splice 
donor sequence, and, optionally, intronic sequences. The 
splice donor (SD) sequence is operatively positioned such 

35 that the exon of the 3 f gene trap cassette is spliced to the 
splice acceptor (SA) site of a downstream exon or a 
cellularly encoded exon. As such, the described 3' gene trap 
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cassette (or gene trap vector incorporating the same) shall 
not incorporate a splice acceptor (SA) sequence and a 
polyadenylation site operatively positioned downstream from 
the SD sequence of the gene trap cassette. In a preferred 
5 embodiment, the exon component of the 3* gene trap cassette, 
which also serves as a sequence acquisition cassette, will 
comprise exon sequence and a splice donor sequence derived 
from genetic material that naturally occurs in an eukaryotic 
cell. 

10 An additional embodiment of the present invention is the 

use of the described vectors to acquire novel DNA sequence 
information from gene trapped exons from an infected target 
cell or a plurality of target cells. 

Additional embodiments of the present invention include 

15 recombinant vectors, particularly viral vectors, that have 
been genetically engineered to incorporate the described 3 1 
gene trap cassette. Preferably, although not necessarily, 
these vectors will additionally incorporate a selectable 
marker that allows for maintenance and detection of vector 

2 0 sequence in the target cell. The selectable marker can be 
utilized as a 5 1 gene trap cassette that is placed upstream 
from, and in the same orientation as, the 3' gene trap 
cassette. An additional embodiment of the present 

invention is the use of the novel 3" gene trap cassette, or 

25 vectors comprising the same, to mutate and trap genes in a 

population of target cells, or tissues, in vitro or in vivo, 

and/or to obtain the polynucleotide sequence of unknown genes 
(i.e., discover new genes). As such, general methods of gene 

mutation, identification, and phenotypic screening are 
30 described that use the described 3' gene trap cassette, and 
vectors comprising the same. 

Another embodiment of the present invention is the use 
of the presently described vectors (e.g., viral vectors 

comprising the described 3' gene trap cassette) to activate 
35 gene expression in target cells. Preferably, the vectors are 
retroviral vectors that are nonspecif ically integrated (using 
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viral- integration machinery) into the target cell genome. 
Additionally, assays are described that employ the described 
3' gene trap cassette, or vectors incorporating the same, to 
activate, genetically or phenotypically select for, and 
5 subsequently identify new genes. 

Additional embodiments of the presently described 
invention include libraries of eukaryotic cells having genes 
that have been simultaneously mutated (by one or more of the 
described mutagenic components) , and identified (using the 

10 described 3 1 gene trap cassette) using the described vectors, 
and/or cDNA libraries produced by exploiting the targeting 
frequency and the sequence acquisition features of the 
described vectors. 

Another embodiment of the present invention is a method 

15 of obtaining DNA sequence information from a target cell, 

comprising the steps of nonspecif ically integrating a 3' gene 
trap cassette, obtaining the chimeric RNA transcript produced 
when the gene trap cassette is spliced by the target cell's 
endogenous splicing machinery to an endogenous exon encoded 

20 within the target cell genome, and obtaining sequence 

information from the endogenously encoded exon from the 
target cell genome. 

4.0. DESCRIPTION OF THE FIGURES 

25 Figure 1 presents a diagrammatic representation of how 

the presently described 3 1 gene trap cassette is spliced to 
cellular exons after the cassette is incorporated into the 
target cell genome. 

Figure 2 shows a dual (5' and 3 1 ) gene trap vector that 

30 incorporates a selectable marker in the 5' trap and the 

presently described 3 ? gene trap. Figure 2 also shows the 
positions of recombinase recognition, e.g. frt or lox, sites 
that can be located, for example, 5 ! to the promoter of the 
3' gene trap cassette and 3' to the SD of the 3' gene trap 

35 cassette. The displayed features are in reverse -orientation 
relative to the flanking LTRs. 
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5.0. DETAILED DESCRIPTION OF THE INVENTION 

In the modern age of genomics, gene trapping has proven 
to be a powerful approach for both grouping gene sequences 
into functional categories, and identifying novel genes. For 
5 example, initial results have shown that about half of the 
gene trap events from embryonic stem cells thus far 
characterized identify gene sequences that have not been 
previously discovered by traditional cDNA library technology. 
' Gene trapping (using promoter traps) has been used in a 
10 variety of cell types to genetically screen for genes that 

are induced by inductive signals, differentiation events, or 
phenotypes of interest (i.e., in gene discovery). 

Additionally, such screens have been used to identify tumor 
suppressor genes, genes induced by cellular differentiation 

15 processes such as hematopoietic and muscle cell 

differentiation, genes induced by signals that induce 
cellular events such as B cell activation or apoptosis, and 
genes activated by small molecules or other compounds. These 
studies indicate that gene trapping can be used to group 

20 genes based upon their function in important cellular and 

physiological processes. However, the broader exploitation 
of these screens has been limited by the difficulty of 
identifying the trapped genes. 

Several of the issues that must generally be addressed 

25 when designing gene trap vectors include, but are not limited 
to: 1) the percentage of the target cell genome that can be 
effectively trapped by a given vector ("target size"); 2) the 
mutagenicity of the vector after insertion into a gene in a 
target cell; and 3) identifying the mutated gene by 

30 sequencing the chimeric transcript produced by gene trap 

event. The present vectors have been engineered to address 
the above concerns by, for example, incorporating features 
that optimize the efficiency of the splice acceptors and 
splice donors present in the vectors. 
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5.1. The Broad Applicability Of The Described Vectors 

The presently described vectors can be used in virtually 
any type of eukaryotic cell that can be manipulated to insert 
a gene trap vector into the genome of the cell. For example, 
5 vectors that incorporate the presently described 3 1 gene trap 
cassette can be used to trap genes and/or acquire sequence 
information from primary animal tissues as well as any other 
eukaryotic cell or organism including, but not limited to, 
yeast, molds, fungi, and plants. Plants of particular 

10 interest include dicots and monocots, angiosperms (poppies, 
roses , camellias , etc . ) , gymnosperms (pine , etc . ) , sorghum, 
grasses, as well as plants of agricultural significance such 
as, but not limited to, grains (rice, wheat, corn, millet, 
oats, etc.), nuts, lentils, chickpeas, tubers (potatoes, 

15 yams, taro, etc.), herbs, cotton, hemp, coffee, cocoa, 

tobacco, rye, beets, alfalfa, buckwheat, hay, soy beans, 
bananas, sugar cane, fruits (citrus and otherwise), grapes, 
vegetables, and fungi (mushrooms, truffles, etc.), palm, 
maple, redwood, rape seed, saf flower, saffron, coconut yew, 

20 oak, and other deciduous and evergreen trees. Alternatively, 
linearized 3 1 gene trap cassettes can be introduced to target 
cells using the described conventional methods of nucleotide 
delivery. 

Additional examples of suitable animal target cells 
25 include, but are not limited to, mammalian, including human, 
or avian endothelial cells, epithelial cells, islets, neurons 
or neural tissue, mesothelial cells, osteocytes, lymphocytes, 
chondrocytes, hematopoietic cells, immune cells, cells of the 
major glands or organs (e.g., lung, heart, stomach, pancreas, 
30 kidney, skin, etc.), exocrine and/or endocrine cells, 

embryonic and other totipotent or pluripotent stem cells, 
fibroblasts, and culture adapted and/or transformed versions 
of the above can be used in conjunction with the described 
vectors. Additionally, tumorigenic or other cell lines can 
35 be targeted by the presently described vectors. 

Preferred target cells for gene trapping using the 
described vectors are embryonic stem cells (ES cells) . ES 
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cells are pluripotent or totipotent. Thus, ES cells that 
have been genetically engineered in vitro, can subsequently 

be introduced into a developing fetus or embryo (e.g., into 

a morula or a blastocyst) to result in chimeric animals . 
5 These chimeric animals can subsequently be bred to produce 
offspring that are heterozygous or homozygous for the 
engineered allele. In the case of mammalian animals, the ES 
cells are typically microinjected into blastocysts which are 
then implanted into pseudopregnant host animals. 

10 The broad applicability of the described ES cell 

technology is shown in the number of different animal systems 
to which the technology has been successfully applied. For 
example, and not by way of limitation, ES cells and/or 
transgenic animals have been described in avian systems (U.S. 

15 Patent No. 5,656,479), swine (U.S. Patent No. 5,523,226), 

non-murine pluripotential cells (U.S. Patent No. 5,690,926), 
cattle, sheep, goats, rabbits, and mink (U.S. Application 
Ser. No. 60/007689 or WO1996US0018988 filed by White et al., 

and WO1997EP0002323) , and human ES Cells (U.S. Application 
20 Ser. No. 08/699,040, filed by Robl et al . ) all of which are 

herein incorporated by reference. 

Typically, vectors incorporating the presently described 
features can be introduced into target cells by any of a wide 
variety of methods known in the art. Examples of such 

25 methods include, but are not limited to, electroporation, 
viral infection, retrotransposition, transposition, 
microparticle bombardment, microinjection, lipofection, 
transf ection, as cationic lipid complexes, or as non- 
packaged/complexed, or "naked, " DNA. 

30 The vectors described in the present invention can also 

be used in conjunction with virtually any type of phenotypic 
or genetic screening protocols both in vitro and in vivo, and 

the presently described vectors provide the additional 
advantage of enabling rapid methods of identifying the DNA 
35 sequences of the trapped genes. 
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The structural features of the vectors of the present 
invention can be incorporated into any vector backbone so 
that the resulting construct is capable of integrating into 
the genome of a eukaryotic cell in a substantially non- 
specific fashion and preferably in a. completely non-specific 
fashion. A large number of vectors known in the art may be 
used. Possible vectors include, but are not limited to, 
plasmids or modified viruses, but the vector system must be 
compatible with the host cell used. Such vectors include, 
but are not limited to, bacteriophages such as lambda 
derivatives, or plasmids such as PBR322 or pUC plasmid 
derivatives or the Bluescript vector (Stratagene USA, La 
Jolla, California) . The insertion of the DNA fragments 
corresponding to the features described below into a suitable 
vector can, for example, be accomplished by ligating the 
appropriate DNA fragments into the chosen vector that has 
complementary cohesive termini. However, if the 
complementary restriction sites of the DNA fragments are not 
present in the cloning vector, the ends of the DNA molecules 
may be enzymatically modified. Alternatively, any site 
desired may be produced by ligating nucleotide sequences 
(linkers) onto the DNA termini; these ligated linkers may 
comprise specific chemically synthesized oligonucleotides 
encoding restriction endonuclease recognition sequences. 

5.2. Structural Features Of The Described Vectors 
5.2.1. Marker Gene 

Vectors contemplated by the present invention can 
be engineered to contain selectable marker genes that provide 
for the selection of cells that have incorporated the marker 
into the cellular genome. In general, such selectable 
markers enable facile methods of identifying and selecting 
for eukaryotic cells that incorporate and express the 
proteins encoded by the selectable markers. Examples of such 
selection methods include antibiotic, colorimetric, 
enzymatic, and fluorescent selection of cells that have 
integrated a gene trap event. One example of such a 
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selectable marker gene is 3geo, but any of a number of other 
selectable markers can be employed (for example, see U.S. 
Patent No. 5,464,764 herein incorporated by reference). An 
example of a plant selectable marker is hygromycin 
5 phosphotransferase . 

Accordingly, one embodiment of the present invention 
contemplates vectors that are engineered to incorporate, and 
optionally express, a marker gene that facilitates the 
tracking and identification of target cells that incorporate 
10 the presently described 3* gene trap cassette. Such markers 
include, but are not limited to, antibiotic resistance genes, 
colorimetric marker genes, enzymes {e.g., 3 - lactamase ) , or 

other marker genes that mediate the direct or indirect 
expression of, for example, fluorescent marker genes such as 
15 the gene encoding green fluorescent protein, and assays for 
detecting the same, which are described, inter alia, in U.S. 

Patent No. 5,625,048, herein incorporated by reference. For 
the purposes of the present disclosure, the term "directly," 
when used in a biological or biochemical context, refers to 

20 direct causation of a process that does not require 

intermediate steps, usually caused by one molecule contacting 
or binding to another molecule (which can be a molecule of 
the same type or a different type of molecule) . For example, 
molecule A contacts molecule B, which causes molecule B to 

25 exert effect X that is part of a biological process. For the 
purposes of the present invention, the term "indirectly," 
when used in a biological or biochemical context, refers to 
indirect causation that requires intermediate steps, usually 
caused by two or more direct steps. For example, molecule A 

30 contacts molecule B to exert effect X which in turn causes 
effect Y. Also for the purposes of the present invention, 
the term "gene" shall refer to any and all discrete coding 
regions of the cell's genome, as well as associated noncoding 
and regulatory regions, or shall refer to the region encoding 

35 a specific and functional protein product or activity. 

Additionally, the term "operatively positioned" shall refer 
to the fact that the control elements or genes are present in 
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the proper orientation and spacing to provide the desired or 
indicated functions of the control elements or genes. Also 
for the purposes of the present invention, a gene is 1 
"expressed" when a control element in the cell mediates the 
5 production of functional and/or detectable levels of mRNA 

encoded by the gene, or a selectable marker inserted therein, 
that can subsequently be spliced/processed and, where 
applicable, translated to produce an active product. A gene 
is not expressed where the relevant control element in the 

10 cell is absent, has been inactivated, or does not mediate the 
production of functional and/or detectable levels of mRNA 
encoded by the gene, or a selectable marker inserted therein. 
For the purposes of the present invention, a mRNA is produced 
at "functional" levels if, upon translation, it produces a 

15 protein having the size and activity normally associated with 
the corresponding locus. 

The marker gene can be incorporated into the described 
vectors as a self-contained expression cassette including, in 
operable combination, a marker, promoter for expressing the 

20 marker, ribosome binding/translation start site, and 

polyadenylation sequence. Additionally, the marker can be 
placed in the vector such that it is expressed from a vector 
promoter, and can optionally be engineered to functionally 
incorporate an independent ribosome entry site (IRES) that 

25 facilitates marker expression. 

5.2.2. 5' Gene Trap Cassette 

The presently described vectors can be engineered 
to include a 5' gene trap cassette that typically contains a 

30 splice acceptor site located 5' to an exon (which can encode 
a selectable marker gene) followed by an operatively 
positioned polyadenylation sequence. Typically, vectors 
incorporating 5" gene traps do not contain promoters that 
express the exon encoded in the 5 1 gene trap cassette, and do 

35 not encode a splice donor sequence operatively positioned 5 1 
to the splice acceptor of the exon of the 5 1 gene trap 
cassette. Consequently, after it is integrated into the 
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cellular chromosome the 5' gene trap cassette intercepts the 
normal splicing of the upstream gene and acts as a terminal 
exon. The net effect is that the cellular transcript is 
disrupted and effectively mutagenized by the 5 ! gene trap 
5 cassette. The 5 f gene trap cassette can incorporate a marker 
gene as the exon component, and can thus be used in lieu of 
or in addition to the marker gene described in Section 5.2.1. 

The structural features of the 5 1 gene trap cassette can 
also« be manipulated to produce gene trap events that are 

10 biased as to where the 5 ■ gene trap has integrated into the 
cellar genome (for purposes of illustration, and not 
limitation, the following discussion shall assume that the 
exon of the 5 1 gene trap cassette encodes a selectable 
marker) . For example, given that no promoter is present, the 

15 marker encoded by a 5' gene trap cassette (that has been 

engineered without an IRES) can typically only be expressed 
if it has been integrated into an intron 5 ! from the 
translation start site of the endogenous gene. Given the 
absence of an IRES, if the vector incorporating such a 5 1 

20 gene trap cassette has integrated into an intron that is 

downstream from the translation start site of the endogenous 
gene, the marker can only be expressed if it is present in 
the correct reading frame to produce a fusion protein that 
provides selectable marker activity. Accordingly, vectors 

25 incorporating such 5 ' gene trap cassettes can selectively 
increase the probability that the identified gene trapped 
sequences begin with sequences 5 1 to the start of 
translation. 

An alternative method of producing a similar effect 
30 employs vectors incorporating a nested set of stop codons 

present in, or otherwise engineered into, the region between 
the SA of 5» gene trap cassette and the translation 
initiation codon of the selectable marker, or such stop 
codons can located between the end of the selectable marker 
35 coding region and the polyadenylation sequence. The 

selectable marker can also be engineered to contain an 
independent ribosome entry site (IRES) so that the marker 
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will be expressed in a manner largely independent of the 
location in which the vector has integrated into the target 
cell genome. Typically, but not necessarily, an IRES is not 
used in conjunction with a nested set of stop codons as 
5 described, supra. 

In a particularly preferred embodiment, the described 
vectors employ a 5 ' gene trap cassette that comprises a 
selectable marker gene preceded by a splice acceptor sequence 
and followed a polyadenylation (pA) sequence (SAPgeopA, 

10 Figure 2) . Alternatively, SAIRESPgeopA can be used which 

further incorporates an internal ribosome entry site upstream 
from the (3geo gene, or SAneopA can be used (which dispenses 
with the P-gal activity) . The above 5 1 gene trap cassettes 
can efficiently mutate genes and can be used to follow the 

15 expression of the trapped gene. Optimizing the SA sequence 
used can further enhance, or regulate, the efficiency of the 
5' gene trap cassette. Examples of suitable SA sequences 
include, but are not limited to: 

GCAACCAGTAACCTCTGCCCTTTCTCCTCCATGACAACCAGGT (SEQ ID NO: 1) ; 

GGCGGTCAGGCTGCCCTCTGTTCCCATTGCAGGAA (SEQ ID NO: 3) ; 
TGTCAGTCTGTCATCCTTGCCCCTTCAGCCGCCCGGATGGCG (SEQ ID NO : 4) ; 
TGCTGACACCCCACTGTTCCCTGCAGGACCGCCTTCAAC (SEQ ID NO: 5) ; 
TAATTGTGTAATTATTGTTTTTCCTCCTTTAGAT (SEQ ID NO: 6) ; 
25 CAGAATCTTCTTTTTAATTCCTGATTTTATTTCTATAGGA (SEQ ID NO: 7) ; 
TACTAACATTGCCTTTTCCTCCTTCCCTCCCACAGGT (SEQ ID NO: 8) ; 
TGCTCCACTTTGAAACAGCTGTCTTTCTTTTGCAGAT (SEQ ID NO: 9); 
CTCTCTGCCTATTGGTCTATTTTCCCACCCTTAGGC (SEQ ID NO : 10) ; and 

ATTAATTACTCTGCCCATTCCTCTCTTTCAGAGTT (SEQ ID NO: 11) . Any Of the above 

30 SA sequences can be used in conjunction with, for example, 
SAneopA or SAIRESneopA. 

Optionally, the 5 1 gene trap cassette can be flanked by 
suitable recombinase sites (e.g., lox P, frt, etc.). In one 

such embodiment, a recombinase site flanked 5' gene trap 
35 cassette is used in conjunction with a second 5 ! gene trap 
cassette (present downstream from the 3' recombinase site) 
that encodes a detectable marker, a different selectable 
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marker, or an enzymatic marker (such as, but not limited to, 
green fluorescent protein, beta lactamase, TK, blasticidin, 
HPRT, etc.), and that is preferably not be flanked by the 
same recombinase sites the first 5' gene trap cassette. In 
5 the event that both of the 5' gene trap cassettes are not 
expressed at acceptable levels (via alternative splicing) , 
the second 5' gene trap cassette (that encodes a detectable 
marker) can be "activated" by using a suitable recombinase 
activity (i.e., ere, flp, etc.) in vitro or in vivo to remove 
10 the first (recombinase site flanked) 5' gene trap cassette. 

5.2. 3. 3 ; Gene Trap Cassette 

The presently described 3 1 gene trap cassette 
comprises, in operative combination, a promoter region that 

15 mediates the expression of an exon, and an operative splice 
donor (SD) sequence that defines the 3' end of the exon. 
After integration into the target cell chromosome ,■ the 
transcript expressed by the 3 f gene trap promoter is spliced 
to a splice acceptor (SA) sequence of a trapped cellular exon 

20 located downstream of the integrated 3 ! gene trap cassette. 

Thus, a fusion transcript is generated comprising the exon of 
the 3 1 gene trap cassette and any downstream cellular exons 
the most 3' of which has a polyadenylation signal. 

The fusion transcript can be identified by a variety of 

25 methods known to those of skill in the art at any level of 
expression, i.e., as a heterogenous nuclear RNA, as a 
messenger RNA, as a protein, etc. For example, one may 
perform polymerase chain reaction using a primer pair 
specific for the exon of the 3 f gene trap cassette and the 

30 polyA tail of the transcript. Or, for example, one may use 

an exon in the 3 1 gene trap cassette which encodes an epitope 
which can be identified in an antibody screen, i.e., epitope 
tagging. Other screening methods known in the art include, 
but are not limited to, hybridization (on solid support or in 

35 solution, etc.) with a probe specific for the exon of the 3' 
gene trap cassette. When screening on the protein level, one 
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may carry out the screen in any cellular location, e.g., one 
may screen for secreted proteins encoded by the fusion 
transcript. Or, for example, one may use a first exon which 
encodes a secretion signal, thus making the host cells 
5 secrete many or all fusion peptides encoded by the fusion 
transcripts. All screening methods may also be modified to 
render them specific for the trapped exons and the proteins 
and polypeptides they encode, i.e., PCR primers, 

hybridization probes or antibodies specific for a particular 

10 gene or class of genes may be used to screen. Or, for 
example, one may screen based on a posttranslational 
modification, e.g., one may screen with an antibody specific 
for certain or all glycoproteins . 

As described above, the 3 ! gene trap cassette contains a 

15 promoter that directs- the expression of one or more exons 
(optionally encoding one or more open reading frames) that 
are followed by a splice donor sequence (Figure 1) . Any 
number of transcriptional promoters and enhancers may be 
incorporated into the 3' gene trap cassette including, but 

20 not limited to, cell or tissue specific promoters, inducible 
promoters, the herpes simplex thymidine kinase promoter, 
cytomegalovirus (CMV) promoter/enhancer, SV40 promoters, PGK 
promoter, regulatable promoters (e.g., metallothionein 
promoter), adenovirus late promoter, vaccinia virus 7.5K 

25 promoter, avian (i.e., chicken, etc.) beta globin promoter, 

histone promoters (e.g., mouse histone H3-614, etc.), beta 

actin promoter (preferably chicken) , metallothionein 
promoters (preferably mouse metallothionein I and II) , the 
cauliflower mosaic virus 35S promoter and the like, as well 

30 as any permutations and variations thereof, which can be 

produced using well established molecular biology techniques 
(see generally, Sambrook et al. (1989) Molecular Cloning 
Vols. I -III, Cold Spring Harbor Laboratory Press, Cold Spring 
Harbor, New York, and Current Protocols in Molecular Biology 

35 (1989) John Wiley & Sons, all Vols, and periodic updates 
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thereof, herein incorporated by reference) . 
Promoter/enhancer regions can also be selected to provide 
tissue-specific expression or inducible expression. 

Preferably, the exon (or exons) of the 3 1 gene trap 
5 cassette has been designed to mimic an exon of a gene, 

preferably a first exon. Generally, the exon or exons (and 
part of the intron following the exon(s)) and splice donor 
sequence are derived from a naturally occurring gene; 
however, synthetic exons designed to mimic a real exon can 
10 also be used. For example, such exons might be designed and 
constructed de novo or by modifying existing exons to 

incorporate a high efficiency, or consensus, ribosome binding 
site or to add an IRES sequence 5 1 to the translation 
initiation codon of an open reading frame or exon, to create 
15 an open reading frame, to optimize codon usage, to engineer 
one or more restriction sites that do not alter the amino 
acid sequence encoded by the open reading frame, or to 
engineer an alternative or consensus splice donor sequence 
into the exon. 

20 Presently described vectors use a 3 1 gene trap cassette 

that employs an exon of non-prokaryotic origin, i.e., an exon 

obtained from a eukaryotic organism. Exons useful for the 3 ! 
gene trap cassette of the invention do not encode an 
antibiotic resistance activity, or other selectable marker, 
25 activity (e.g., an antibiotic resistance gene). As discussed 

herein, 3 1 gene trap cassettes incorporating open reading 
frames of noneukaryotic origin typically display a markedly 
reduced efficiency of 3 ! exon trapping. Consequently, 
vectors employing the presently described 3' gene trap 
30 cassette greatly increase the number of target genes that can 
be trapped and rapidly identified by gene trap sequence 
tagging. 

Accordingly, the exon of the 3 1 gene trap cassette 
(including the SD site) is preferably derived from nucleotide 
35 sequence that is similar or homologous to nucleotide sequence 
that is native to an eukaryotic cell, or, possibly, an animal 
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or plant virus, or naturally occurs in, the target cell, or 
the genome of cells from a related species, genus, order, 
class, phylum, or kingdom. For example, an exon from a human 
gene may be used in a 3 1 gene trap cassette that is used in 
5 mouse cells and an exon from a mouse gene may be used in a 3 ' 
gene trap cassette that is used in. human cells. For the 
purposes of the present invention, a homologous sequence is 
defined as a nucleic acid sequence that is capable of binding 
to a target sequence under highly stringent conditions such 

10 as, for example, hybridization to filter-bound DNA in 0.5 M 
NaHP0 4 , 7% sodium dodecyl sulfate (SDS), 1 mM EDTA at 65° C, 
and washing in 0.1xSSC/0.1% SDS at 68° C (Ausubel F.M. et 
al., eds., 1989, Current Protocols in Molecular Biology, Vol . 
I, Green Publishing Associates, Inc., and John Wiley & sons, 

15 Inc., New York, at p. 2.10.3), or possibly under less 
stringent conditions, such as, for example, moderately 
stringent conditions, e.g., washing in 0.2xSSC/0.1% SDS at 
42° C (Ausubel et al . , 1989, supra). Optionally, the exon is 
isogenic to sequence in the target cell genome. 

20 Exons suitable for the 3' gene trap cassette of the 

present invention may also be obtained by combining naturally 
occurring exons, or by combining fragments of naturally 
occurring exons, or by combining fragments of naturally 
occurring exons with synthetic sequences which may be 

25 consensus sequences of naturally occurring exons. For 
example, when using an exon found in the genome of a 
eukaryotic organism that is not the first exon of a gene, one 
may render it useful for the 3 ■ gene trap cassette of the 
present invention by adding a suitable transcription 

30 initiation sequence to the 5' end of the exon. 

Where the target cell genome encodes a gene identical to 
(or corresponding to) the exon of the 3' gene trap cassette, 
the naturally occurring gene will preferably not be expressed 
by the target cell at levels that substantially interfere 

35 with the amplification and sequencing of the trapped exon 
sequences in the target cells. For the purposes of the 
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present disclosure, the term "substantially interfere with 
the amplification and sequencing" shall refer to the fact 
that the endogenous expression of the naturally occurring 
exon may hinder but shall not prevent the amplification and 
5 sequencing of the trapped exon sequence by 3' RACE protocols, 
or, optionally, by conventional cloning and sequencing. 
Additional methods of circumventing this potential 
complication include the incorporation of an unique sequence 
within the otherwise naturally occurring exon of the 3 1 gene 

10 trap cassette that can be used as PGR priming site, or to 

employ a 3 1 gene trap cassette having an exon that does not 
naturally occur in the target cell genome. Yet another 
method of circumventing this potential complication is to use 
an exon in the 3 1 gene trap cassette that is obtained from an 

15 inducible gene, e.g., stress genes. Preferably, in this 

embodiment, the cells in which the 3 1 gene trap cassette is 
used would be maintained under conditions so that the gene 
from which the exon is obtained is not or barely induced, if 
the gene is present in those cells. 

20 The exon of the presently described 3 1 gene trap 

cassette may or may not contain a translation start site 
and/or an open reading frame. Optionally, any open reading 
frame (s) that may be present in the exon can be engineered to 
incorporate codons that have been optimized to reflect the 

25 preferred codon usage of the host cell. 

Given that the exon of the presently described 3 ' gene 
trap cassette preferably comprises sequence native to an 
eukaryotic, or preferably mammalian, cell, the exon will 
typically not constitute a marker encoding a protein having 

30 an antibiotic resistance activity (such as neo, amp, e.g., (3- 

lactamase, tet, kan, and the like) or otherwise confers 

selectable drug resistance or sensitivity to the host cell 
(although such a marker can optionally be appended to, for 
example, the 5 1 region of the exon) . For the purposes of the 
35 present invention, a gene or gene product is capable of 

"conferring" antibiotic resistance if a gene encodes a gene 
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product having an activity that provides a selective growth 
to a prokaryotic or eukaryotic cell expressing the antibiotic 
resistance gene in media containing appropriate 
concentrations of the corresponding antibiotic. 
5 Alternatively, the exon will generally not encode an 

enzymatic activity, or reporter gene, that mediates 
selectable detection via a well known conventional 
chromogenic or fluorescent assay (e.g., 3-galactosidase, 

alkaline phosphatase, or horse radish peroxidase) that is not 
10 native to the, preferably mammalian, target cell. 

Additionally, the presently described vectors shall 
preferably not contain regions of targeting DNA sequence 
(i.e., for directingr gene targeting of the 3' gene trap . 
cassette to a specific genetic locus via homologous 
15 recombination) flanking the described 3' gene trap cassette. 
Moreover, given that splice donor efficiency can be 
influenced by intron sequences downstream from the splice 
donor site, the presently described 3 1 gene trap cassette can 
optionally be engineered to contain between about one base 
20 and about several thousand bases of intron sequence adjacent 
and 3' to the splice donor sequence. 

5.3 . Applications Of The Described Vectors 

Vectors incorporating the described 3 1 gene trap 
cassettes are characterized by a marked improvement in the 
efficiency of 3' gene trapping. As such, another embodiment 
of the present invention is a 3' gene trap cassette, and 
vectors incorporating the same, that are characterized by the 
capability of trapping 3 1 exons with at least about 15 
percent of the efficiency with which a similarly situated 
SA(5geo 5' gene trap cassette (or SAneo 5' gene trap cassette) 
traps 5' exons, preferably, at least about 25 percent, more 
preferably at least about 40 percent, more preferably at 
least about 60 percent, and most preferably at least about 85 
percent. For the purposes of the present invention, a 
similarly situated gene trap cassette is a cassette that is 
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present in a similar orientation within a similar vector. 
Alternatively, similarly situated gene trap cassettes may 
both be present in the same vector. 

Any of a variety of quantitative measurements are 
5 available to those skilled in the art and can be used to 

calculate the relative efficiency of the respective 3' and 5* 
gene trap cassettes as well as the number of genes that can 
be effectively trapped. For example, one can determine the 
percentage of target genes identified by the presently 

10 described 3 ! gene trap cassette relative to the percentage of 
target genes identified by 5 1 gene traps such as SA(3geo or 
SAneo and selected using, for example, the antibiotic G418. 
Alternatively, the percentage of identifiable 3 ■ gene trap 
events can be compared to the percentage of target cells 

15 rendered antibiotic resistant or chromogenically identifiable 
by SAPgeo-mediated 5 1 gene trap events. 

The functional efficiency of the presently described 3 1 
gene trap cassette can also be quantified by the absolute 
number of independent gene trap events characterized using 

20 the vector. Generally, the presently described vectors allow 
for the expedient trapping of at least about one to about 
several hundred genes, typically at least about 1,000 
different genes, more typically at least about 3,000, 
preferably at least about 10,000 genes, more preferably at 

25 least about 25,000 genes, more preferably at least about 

50,000 genes, and most preferably at least about 55,000 genes 
up to the maximum number of genes present in a given cell or 
cell type. For example, murine cells are thought to encode 
between about 60,000 to 100,000 genes or more. 

30 Another measure of gene trapping efficiency is the 

number of distinct cellular exons that can be trapped. 
Typically, the presently described 3' gene trap cassette will 
trap cellular 3 1 exons with sufficient efficiency to enable 
the facile detection, screening, and identification of at 

35 least about 10,000 distinct 3' gene trapped cellular exons 

(generally representing approximately between about 7,500 to 
9,500 different genes-- the number is typically smaller 
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because independent integration events can occur within 
different introns/exons within the same gene) , preferably at 
least about 15,000 distinct 3' gene trapped cellular* exons, 
more preferably at least about 25,000 distinct 3' gene 
5 trapped cellular exons, and most preferably at least about 
50,000 distinct 3* gene trapped cellular exons up to between 
about 70 and about 100 percent of the genes present in the 
mammalian genome. 

10 5.3.1. Gene Trapped Libraries Of Cells 

Given the number of genes that can be rapidly 
characterized using the present vectors, additional 
embodiments of the present invention include gene trapped 
libraries of cultured animal cells that stably incorporate 
15 the presently described 3 f gene trap cassette. The presently 
described libraries may be made by a process comprising the 
steps of treating (i.e., infecting, transf ecting, 

retrotransposing, or virtually any other method of 
introducing polynucleotides into a cell) a population of 

20 cells to stably integrate a vector containing the 3' gene 

trap cassette, identifying or otherwise selecting for stably 
transduced cells, and identifying the trapped 3 1 cellular 
exons. In a preferred embodiment, the animal cell libraries 
comprise mammalian cells, and in a particularly preferred 

25 embodiment, the mammalian cells are embryonic stem (ES) 

cells. Preferably, such libraries are constructed such that 
each mutated cell in the library harbors a. single 
identifiable 3 1 gene trap vector/event (although mutated 
cells harboring multiple gene trap vectors are also 

30 contemplated by the present invention) . 

In an additional embodiment of the present invention, 
the individual mutant cells in the library are separated and 
clonally expanded. The isolated and clonally expanded mutant 
cells are then analyzed to ascertain the DNA sequence, or 

35 partial DNA sequence, of the insert ionally mutated host gene. 
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Thus, the invention further provides for the sequencing of at 
least a portion of every gene mutated in the library. The 
resulting sequence database subsequently serves as an index 
for the library. In essence, every group of clonally 
5 expanded cells in the library is individually catalogued 
using the partial sequence information. The resulting 
sequence is specific for the mutated gene since the present 
methods are designed to obtain sequence information from 
exon3 that have been spliced to the 3 1 gene trap cassette. 

10 The resulting sequence database can be used to identify the 
mutated gene of interest, or, alternatively, represents a 
powerful tool for the identification of novel genes. Once 
identified, the corresponding mutant cell may be taken from 
the library and studied further as described below. 

15 Generally, indexed libraries of isolated cells, or 

individual cell types (e.g., ES cells), that have been 

mutated using vectors incorporating the described 3 1 gene 
trap cassette will comprise a collection of at least about 50 
different isolated mutant cell culture lines, typically at 

20 least about 100, more typically, at least about 500, 

preferably at least about 1,000, more preferably at least 
about 5,000, more preferably at least about 10,000, more 
preferably at least about 25,000, and even more preferably at 
least about 40,000 up to about one to five hundred thousand 

25 different isolated and characterized mutant cell culture 
lines or more. Preferably, the genomes of the different 
mutant cell cultures present in a given library are 
essentially identical (e.g., derived from a common source or 
inbred strain) except for the location of the inserted gene 

30 trap cassette, or vector incorporating the same. 

Ideally, the scope of mutagenesis is the entire set of 
genes that can be trapped in the target cell line. By 
increasing the redundancy of the library, the resulting 
sequence database will ideally contain an essentially 

35 complete representation of the genes that can be trapped in 
the target cell. For the purposes of the present invention, 
the term "essentially complete representation" shall refer to 
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the statistical situation where there is generally at least 
about an 80-95 percent probability that the genomes of the 
cells' used to construct the library collectively contain a 
stably inserted 3' gene trap cassette in at least about 70 
5 percent of the genes that can be trapped in the target cell 
genome, preferably at least about 85 percent, and most 
preferably at least about a 95 percent of the genes that can 
be trapped as determined by a standard Poisson distribution 
(and assuming that a given vector integrates into the genome 

10 nonspecif ically) . 

The broad genomic coverage afforded by the present 
vectors also allows for the large-scale mutagenesis of the 
target cell genome. Typically, such a library of mutated 
target cells will comprise a collection of mutated cells, or 

15 isolated cultures thereof, that collectively represent at 

least one 3 ■ gene trap mutation (mediated by the described 3 1 
gene trap cassette or vector comprising the same) in each 
chromosome present in the target cell genome, preferably at 
least about 2 to 3 independent gene trap mutations per 

20 chromosome will be collectively present in the library, more 
preferably at least about 10 independent gene trap mutations 
per chromosome are represented, and most preferably at least 
about 500 independent gene trap mutations per autosomal 
chromosome (minus the sex chromosomes) , and/or up to about 70 

25 to 90 percent, or even an essentially complete representation 
of the genes in the genome will be collectively represented 
in the library. 

The presently described invention allows for large-scale 
genetic analysis of the genome of any organism/cell that can 

30 be transduced with the described vectors or for which there 
exists cultured cell lines. Accordingly, the described 
libraries can be constructed from any type of cell that can 
be transf ected by standard techniques or transfected with a 
recombinant vector harboring the described 3 1 gene trap 

35 cassette. As such, the presently described methods of 

making, organizing, and indexing libraries of mutated animal 
cells are also broadly applicable to virtually any eukaryotic 
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cells that may be genetically manipulated and grown in 
culture . 

Where mouse ES cells are used to construct the library, 
and preferably early passage ES cells, the library becomes a 
5 genetic tool for the comprehensive functional study of the 
mouse genome. Since ES cells can be injected back into a 
blastocyst and incorporated into normal development and 
ultimately the germ line, the mutated ES cells of the library 
effectively represent a collection of mutant transgenic mouse 

10 strains (see generally, U.S. Patent No. 5,464,764 issued 
November 7, 1995, herein incorporated by reference) . 

A similar methodology can be used to construct virtually 
any non-human transgenic animal (or animal capable of being 
rendered transgenic) , or transgenic plants. Such nonhuman 

15 transgenic animals may include, for example, transgenic pigs, 
transgenic rats, transgenic rabbits, transgenic cattle, 
transgenic goats, and other transgenic animal species, 
particularly mammalian species, known in the art. 
Additionally, bovine, ovine, and porcine species, other 

20 members of the rodent family, e.g., rat, as well as rabbit 

and guinea pig and non-human primates, such as chimpanzee, 
may be used to practice the present invention. 

Transgenic animals and cells produced using the 
presently described library and/or vectors are useful for the 

25 study of basic biological processes and the development of 

therapeutics and diagnostics for diseases including, but not 
limited to, aging, cancer, autoimmune disease, immune 
disorders, alopecia, glandular disorders, inflammatory 
disorders, ataxia telangiectasia, diabetes, arthritis, high 

30 blood pressure, atherosclerosis, cardiovascular disease, 
pulmonary disease, degenerative diseases of the neural or 
skeletal systems, Alzheimer's disease, Parkinson's disease, 
asthma, developmental disorders or abnormalities, 
infertility, epithelial ulcerations, and viral and microbial 

35 pathogenesis and infectious disease (a relatively 

comprehensive review of such pathogens is provided, inter 

alia, in Mandell et al . , 1990, "Principles and Practice of 
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Infectious Disease" 3rd. ed. , Churchill Livingstone Inc., New 
York, N.Y. 10036, herein incorporated by reference) . As 
such, the described animals and cells are particularly useful 
for the practice of functional genomics (similar libraries, 
5 and methods of making and screening the same, are discussed 
in U.S. Application Ser. No. 08/942,806, filed October 2, 
1997 the disclosure of which is herein incorporated by 
reference in its entirety) . 

10 5.3. 2. The Acquisition Of DNA Sequence Information 

The sequencing of cDNA libraries has provided many 
hundreds of thousands of expressed sequence tags (ESTs) . 
These sequence tags are typically thought to identify genes 
or the coding portion of DNA. Since genes are thought to 

15 code for most, if not all, potential drug targets, there has 
been a rush to obtain ESTs identifying all mammalian genes. 
However, in spite of the wealth of sequence data generated 
thus far, many genes have proven difficult to identify using 
established cDNA methods because many genes are not 

20 expressed, are expressed at very low levels, are expressed 
only in specific cell types, or are only transiently 
expressed. Given that gene trapping can identify genes 
independent of their endogenous expression levels gene 
trapping is an important tool for gene discovery (as 

25 demonstrated by the large number of novel sequences that have 
been identified using the described vectors). Like EST 
technology, one potential limitation of 5' gene trap vectors 
(vectors designed to trap 5' exons) is that only expressed 
genes are typically trapped. Accordingly, particularly for 

30 the purposes of gene discovery, ES cells are particularly 
preferred target cells because ES cells are thought to be 
generally promiscuous in the expression of most genes. Given 
this promiscuity, then most genes could be trapped in ES 
cells using the presently described vectors. To test the 

35 percentage of genes that can be detected as expressed in ES 
cells, 23 ESTs from the GenBank dbest database were selected 
at random, and primers were synthesized that would identify 
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the genes by PGR. When these primers were used in RT-PCR 
assays using ES cell RNA, all 23 sets of primers produced 
product. This indicates that transcripts for all 23 genes 
could be detected in ES cells. Given that the 23 ESTs 
5 screened were selected at random, it is likely that they are 
largely representative of genes in general and indicate that 
a majority of genes that are expressed in other cell types at 
sufficiently high levels to have been identified by 
sequencing of conventional cDNA libraries are also expressed 

10 in ES cells and are thus presumably identifiable using 
SAselectable marker poly A (5' gene trap) vectors. 

However, in those instances where genes are either not 
expressed or only poorly expressed, a 3' gene trap cassette 
must be utilized to trap and identify the genes. In 

15 addition, 3' gene trap cassettes enable the rapid procurement 
of DNA sequence data from the trapped gene by automated 
means . 

Vectors designed to trap 3 1 exons have made it possible 
to produce large numbers of mutations and rapidly identify 

20 the genes that have been mutated. However, a limitation of 
initial versions of such vectors is that selectable marker 
genes used in the 3 1 gene trap are inefficiently utilized by 
the splicing machinery of most eukaryotic cells. As a 
consequence, vectors employing a 3' gene trap cassette that 

25 employ an exon encoding an activity conferring antibiotic 

resistance only allow the facile and efficient gene trapping 
and identification (using 3 1 RACE) of a relatively small 
proportion of the genes in the genome. Additionally, the 
inherent inefficiency of selecting for trapped 3 ' exons 

30 limits the total number of genes that can be analyzed using 
such methods. Consequently, prior to the present invention, 
only a small portion of the cellular genome had been 
effectively trapped/mutagenized using antibiotic selection- 
mediated 3' exon trapping. 

35 The presently described vectors incorporate a 3' gene 

trap cassette that typically allows several fold to more than 
an order of magnitude greater number of genes to be trapped 
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and identified by exon sequence as compared to initial 3 1 
gene trap vectors that utilize an exon encoding a selectable 
marker activity. 

The presently described vectors can also incorporate 3 1 
5 and/or 5 ! gene trap cassettes that are engineered to increase 
the probability of identifying the 5 1 ends of the open 
reading frames of genes. This is significant because the 5 ! 
ends of genes often code for the signal sequence that is 
found in secreted and transmembrane proteins. This group of 

10 genes is highly enriched for potential protein therapeutics 
and drug targets. Given that 5 1 noncoding sequences average 
about 100 bp in length and the average length gene trap 
sequence is about 500bp, gene trapped sequences generated 
using the presently described vectors will typically identify 

15 the 5' portion of the tagged open reading frame. This is 

especially valuable since 5 1 ends of genes can be difficult 
to obtain due to complicating factors such as high GC 
content, secondary structure, and reverse transcriptase's 
lack of processivity . 

20 When a large number of gene traps in known genes were 

made and identified using the described vectors, 93% of the 
gene trap sequence tags that matched cDNA sequences in 
GenBank contained the same or additional 5 f sequence. This 
confirms that the described 3 1 gene trap cassette can be used 

25 to identify and characterize the 5' termini of genes. In 

fact, the gene trap methods of the present invention identify 
the 5 1 end of genes better than or equal to other methods 
described to^ date. 

One of the major challenges in the field of genomics 

30 remains the isolation and cloning of full length cDNAs for 

all genes. To date, this has required the production of cDNA 
from a wide variety of tissues, followed by the subsequent 
sequencing of the individual cDNAs. As described above, 
using such methods it can be very difficult to obtain the 5 1 

35 ends of cDNAs . Additionally there is the problem that in 
order to obtain a complete repertoire of cDNAs, individual 
cDNA libraries must made from essentially every 
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differentiated cell type and at every developmental time 
point because genes must be expressed in order to be cloned 
as ESTs. 

As discussed above, the presently described vectors can 
5 be used for the creation of cDNA libraries. When introduced 
to cells in culture, the 3 f gene trap cassette produces 
transcripts of genes independent of whether or not they are 
normally expressed in that cell type. The expression levels 
of the various trapped genes are normalized by the inserted 

10 promoter so that even genes that are only expressed at very 
low levels are identified. Using the presently described 
methods and vectors, one can obtain broad cDNA coverage of 
the target cell genome from a single library without having 
to independently produce multiple cDNA libraries from 

15 multiple cell types that were grown under multiple 
conditions. 

The presently described 3 ' gene trap cassette can be 
inserted into the genome of tissue culture cells, for 
example, and methods (e.g., PCR) can be used that only allow 

20 cDNA arising from trapped genes to be subcloned into the cDNA 
library. These methods will increase coverage of the cDNAs 
produced while substantially decreasing the labor involved to 
produce the libraries. As discussed above, the presently 
described methods are also particularly useful in obtaining 

25 the 5' ends of genes, and thus optimize the chances of 

obtaining full length cDNAs. Examples of variables that can 
be used to alter the variety and number of trapped cDNAs 
produced using the described vectors include, but are not 
limited to, adjusting the multiplicity of infection, and 

30 producing cDNAs from infected target cells that have not been 
subject to a period of selective culture in order to select 
for cells incorporating and expressing an exogenously 
introduced selectable marker. The resulting gene trapped 
cDNA libraries can be sequenced to produce a multiplicity of 

35 gene trapped coding regions of genes, that can be used for 
bioinformatics, gene expression studies both in situ and in 
vitro (i.e. hybridization studies, gene chips (which can also 
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use oligonucleotide sequences corresponding to the trapped 
gene sequences), etc.), and the production of gene trap 
sequence databases from a variety of animals and plants. 
These gene trap sequences can be utilized as probes directly, 
5 or oligonucleotide sequences corresponding to the gene trap 
sequences can be used screen libraries by hybridization or 
PCR. Also, gene trap sequences identified using the 
disclosed vectors can be incorporated into cloning vectors 
that direct the expression of the gene trap sequences . For 

10 the purposes of the present disclosure, an isolated 

polynucleotide sequence having, containing, or otherwise 
incorporating such a gene trap sequence (or an 
oligonucleotide sequence derived therefrom) shall mean any 
and all isolated polynucleotides or vectors minimally 

15 incorporating, or comprising, a contiguous stretch of the 
described cDNA gene trap sequence (or an oligonucleotide 
sequence derived therefrom) inclusive of any additional 
naturally occurring or recombinant sequences that may flank 
the described gene trap sequence present in such isolated 

20 polynucleotides or vectors. 

Given the speed and efficiency with which DNA (and 
corresponding amino acid) sequence information can be 
obtained using the described methods and vectors, it is clear 
that they provide important tools for conducting genetic 

25 screens in any cell (including primary and secondary cells) 
or cell line that contains splicing machinery and genes 
containing introns . The presently described gene trap 
vectors represent a particularly important technological 
breakthrough because the described 3 1 gene trap cassette 

30 allows for the rapid identification of roughly 13 fold (as 
empirically determined) more genes than can be efficiently 
obtained using conventional 3 1 gene trap vectors that rely 
upon gene trapping as detected by antibiotic selection. 
Combined with the frequency of obtaining novel gene 

35 sequences, the observed increase in identifiable gene trap 
targets will provide sequence information for large numbers 
of novel genes and gene sequences. Additionally, when ES 
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cells are targeted, each of these novel sequences represent 
both newly identified gene (and potential drug or drug 
target) and a "knockout" cell and a potential "knockout" 
embryo or animal. 
5 The rapid sequence acquisition features of the presently 

described methods, libraries, cells, and animals are well 
suited for rapidly identifying the molecular /genetic basis 
for disease as well as genetically determined advantages such 
as prolonged life-span, low cholesterol, low blood pressure, 

10 resistance to cancer, low incidence of diabetes, lack of 
obesity, or the attenuation of, or the prevention of, all 
inflammatory disorders, including, but not limited to 
coronary artery disease, multiple sclerosis, rheumatoid 
arthritis, systemic lupus erythematosus, and inflammatory 

15 bowl disease. Given the wide coverage provided by the large 
number of target genes, a particularly useful application of 
the described techniques involves the characterization and 
analysis of coding region single nucleotide polymorphisms 
(cSNPs) . 

20 

5.4. Methods Of Introduction 

The presently described 3 ' gene trap cassette is 
preferably introduced into target cells as a structural 
component of any of a wide range of vectors that can be 

25 specifically or nonspecif ically inserted into the target cell 
genome (recombinase systems can also be used to insert the 3' 
gene trap cassette) . Suitable vectors that can be used in 
conjunction with the presently disclosed features include, 
but are not limited to, herpes simplex virus vectors, 

3 0 adenovirus vectors, adeno-associated virus vectors, 

retroviral vectors, lentiviral vectors, pseudorabies virus, 
alpha-herpes virus vectors, and the like. A thorough review 
of viral vectors, particularly viral vectors suitable for 
modifying nonrepli eating cells, and how to use such vectors 

35 in conjunction with the expression of polynucleotides of 

interest can be found in the book Viral Vectors: Gene Therapy 
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and Neuroscience Applications Ed. Caplitt and Loewy, Academic 
Press, San Diego (1995) . 

Where retroviral vectors are used to deliver thfe 
presently described 3' gene trap cassette, the retroviral 
5 vectors can be used in conjunction with retroviral packaging 
cell lines such as those described in U.S. Patent No. 
5,449,614 (" '614 patent") issued September 12, 1995, herein 
incorporated by reference. Where non-mouse animal cells are 
to be used as targets for generating the described libraries, 

10 packaging cells producing retrovirus with amphotropic 

envelopes will generally be employed to allow infection of a 
broad range of host cells. Alternatively, pantropic 
packaging cell lines such as, but not limited to, the cell 
line 293/GPG (Ory et al., 1996, Proc. Natl. Acad. Sci., USA, 

15 53:11400-11406, and U.S. Applic. Ser. No. 08/651,050, herein 
incorporated by reference) can be used to package the 
described vectors, or a suitable viral, e.g., retroviral, 
receptor gene can be transfected into the non-murine, e.g., 
human, target cells. 

20 Additionally, the described retroviral vectors can be 

packaged in conjunction with chimeric integrase molecules as 
described in U.S. Application Ser. No. 08/907,598, herein 
incorporated by reference. Typically, the LTRs used in the 
construction of the packaging cell lines are self- 

25 inactivating. That is, the enhancer element is removed from 
the 3 ■ U3 sequences such that the proviruses resulting from 
infection would not have an enhancer in either LTR. An 
enhancer in the provirus may otherwise affect transcription 
of the mutated gene or nearby genes. Typically, the gene 

30 trap cassettes of the described retroviral vectors are 

present in an orientation opposite the normal functional 
orientation of the retroviral LTRs. 

An additional advantage of using viral, and particularly 
retroviral, infection (e.g., biological methods) to deliver 

35 recombinant viral vectors incorporating, inter alia, the 3 f 
gene trap cassette is that viral infection is more efficient 
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than standard nonbiological methods of delivering genetic 
material to target cells. Where recombinant genetic material 
is delivered by retroviral infection, the recombinant RNA 
genome of the retrovirus is reverse transcribed within the 
5 target cell, and the retroviral integrase packaged within the 
infecting virus subsequently mediates the essentially non- 
specific integration of the vector (and 3 1 gene trap 
cassette) into the target cell genome. Accordingly, 
additional embodiments of the present invention include 

10 methods of inserting recombinant vectors incorporating the 
described 3 1 gene trap cassette that are mediated by 
integrase or recombinase activities that are either 
exogenously added to the target cell, or do not naturally 
occur within the target cell. 

15 Representative retroviral vectors that can be adapted to 

incorporate the presently described 3 1 gene trap cassette are 
described, inter alia, in U.S. Patent No. 5,521,076, and U.S. 

Applications Ser. Nos . 08/942,806, filed October 2, 1997, and 
08/907,598 filed August 8, 1997 (which further disclose 

20 screening protocols that can be used to assay for specific 

gene trap events either biochemically or phenotypically) the 
disclosures of which are herein incorporated by reference. 

Typically, the orientation of the gene trap cassettes 
incorporated into retroviral vectors is opposite to that of 

25 normal retroviral transcription; however, retroviral vectors 
are also contemplated where one or more gene trap cassettes 
are incorporated in the same orientation as normal retrovirus 
transcription. Typically, the reason for placing a gene trap 
cassette in an opposite orientation relative to the LTRs is 

30 that the presence of engineered control elements such as 

polyadenylation signals, splice sites and the promoters, can 
interfere with the proper transcription of the retroviral 
genome in the packaging cell line, and subsequently reduce 
retroviral titers. 

35 Additionally, since a Cryptic 1 splice donor sequence is 

found in the inverted LTRs, this splice donor can be removed 
by site specific mutagenesis so that it does not adversely 
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effect trapping related splicing events. Optionally, the LTR 
promoter and/or enhancer function can be inactivated by 
deleting all or a portion of the promoter and/or enhancer 
sequences . 

5 

5.5. Molecular Genetic Applications 
5.5.1. Gene Activation 

Another embodiment of the present invention is the 
use of the 3 1 gene trap cassette to screen for both gain or 
10 loss of function in animals, e.g., mice, and cultured cells. 

When vectors are used that incorporate a 3 ' gene trap having 
an exon that lacks a translation start site, a given target 
gene can be either over expressed or insertionally 
inactivated (mutated) depending on where the vector has 

15 integrated within the. gene. If the vector lands in an intron 
preceding the start of translation, it can cause over 
expression of the full open reading frame encoding the 
cellular protein. Using these types of trapping events one 
can conduct genetic screens based upon gene over expression. 

20 These screens could be done in cell culture or in mice, for 
example, in order to discover genes that play significant 
roles in disease processes. For example, these screens could 
be used to identify oncogenes by introducing the 3 1 gene trap 
cassette into primary embryo fibroblasts and selecting for an 

25 ability to grow in soft agar. Alternatively, assaying for 

cells able to escape cellular senescence would also allow the 
identification of potential oncogenes. 

In order to demonstrate that the present vectors can be 
used to select for trapping events that result in gene 

30 expression (or over expression) , an experiment was conducted 
to determine whether genes could be trapped that allow 
expression of factors that promote ES cell differentiation. 
Large numbers of genes were trapped in cell culture on tissue 
culture plates. Multiple plates were infected in parallel 

35 and the resulting plates were observed for ES cell 
differentiation. Some plates showed almost no 
differentiation whereas some plates would have 100% 
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differentiated ES cells. This differentiation is likely the 
result of the expression of a gene that is either a 
differentiation factor or causes the ES cells to produce a 
differentiation factor and pump it into the media resulting 
5 in differentiation of all the cells on the dish. 

Importantly, this also demonstrates that the 3' gene trap 
system can be used to activate and screen for secreted 
molecules that produce specific biological responses by 
testing supernatants of the gene trap pools. Screening for 

10 ES cell differentiation factors is one example but this 

technique can be used to identify secreted molecules involved 
in any cellular response of interest. One could for example 
screen for secreted molecules that induce apoptosis or 
hematopoietic cell differentiation. 

15 Given the increased expression afforded by the presently 

described 3' gene trap cassette, an additional application of 
the presently described 3 1 gene trap cassettes is gene 
activation. For example, after suitable animal cells are 
treated or infected with vectors that incorporate the 

20 described 3' gene trap cassette, if the vector integrates 

into the 5* intron of an otherwise quiescent gene, the gene 
can be "activated" and over expressed by the regulatory 
elements, e.g., enhancer/promoter elements incorporated into 
the 3 1 gene trap cassette. Using such nontargeted, 

25 nonspecific, or biased nonspecific (see U.S. Applic. Ser. No. 
08/907,598) gene activation, modified animal cells, including 
human cells, can be produced that over express any of a wide 
variety of natural cellular products . 

Products that are particularly deemed useful for such 

30 application include normally secreted molecules or hormones 
such as, but are not limited to, erythropoietin (epo) , tPA, 
cytokines, interleukins , tumor suppressors, chemokines, 
secreted molecules, G-CSF, GM-CSF, nerve growth factor (NGF) , 
ciliary neurotropic factor (CNTF) , brain-derived neurotropic 

35 factor (BDNF) , interleukins 1-2 and 4-14, tumor necrosis 
factor-a (TNF-a) , a or y interferons and the like, leptin, 
and factors VIII and IX. 
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The activation of quiescent genes, over expression, or 
abnormal expression of genes by the 3 1 gene trap cassette can 
also be used to study gene function within an organism. Gene 
over expression may be used to study gene function, and by 
5 trapping genes with the 3 1 cassette, genes can be over 

expressed within an organism. The over expression may cause 
a phenotype in the organism that sheds light on the function 
of the gene. For example, the specifically described 
retroviral vector contains the PGK promoter which is 

10 ubiquitously expressed. When a gene is trapped in ES cells 

and the ES cells are subsequently used to make mice, the mice 
will over express the trapped gene ubiquitously. Further 
modifications could be made for instance to use a promoter 
that is tissue-specific rather than the PGK promoter in order 

15 to over express the trapped gene in a tissue-specific manner. 
The albumin promoter could be used for liver-specific over 
expression. Additionally, a signal sequence could be added 
to the 3 1 trapping cassette to cause secretion of the trapped 
gene's protein product from the cell into the extracellular 

20 space, into the bloodstream, or mammary excretions. This 
could facilitate the understanding of gene function. 

Since over expression is one possible outcome of a gene 
trap event using the 3 1 gene trap cassette, it could prove 
useful to be able to remove the 3* trap/over expression 

25 component. This can be accomplished by flanking any 

essential component of the 3' trap cassette (essential 
components may include the promoter, the exon, the splice 
donor, the intronic sequence or the entire cassette) with 
recombinase sites such as those recognized by the flp or ere 

30 recombinases . In this way, the addition of the corresponding 
recombinase in cells or in the organism allows one to 
conditionally reverse or remove over expression as desired. 

For gene activation, a generic 3' gene trap cassette can 
be employed that incorporates an exon that is native to, or 

35 compatible with the biology of, the target cell, or a 
specific 3' gene trap cassette can be constructed that 
utilizes a specific exon and splice donor site from a known 



- 35 - 



WO 99/50426 



PCT/US99/06474 



gene. Optionally, given that gene activation using 3' gene 
traps typically requires that the vector integrate or insert 
upstream (5 1 ) from the translation start site of the 
activated gene, the gene activation exon will preferably not 
5 incorporate a functional translation start site (IRES or 
Kozak sequence) , or will only incorporate a nominally 
functional (or cryptic) translation start site capable of 
mediating only incidental levels of translational activity. 
Alternatively, the incorporation of an internal ribosome 
10 entry site into the exon can result in the over expression of 
the 3 ! gene trapped, or activated, gene. 

Where a fusion product between the 3 1 gene trap exon and 
a downstream cellularly encoded exon {e.g., that only encodes 

a particular domain of the protein product of the "activated" 
15 gene) is desired, the gene trap vector will typically 

incorporate a functional translation start site or internal 
ribosome entry site and translation start site. 

Alternatively, in those instances where the described 
vectors integrate downstream from the translation start site, 
20 the gene will be mutated, and screens to detect such loss of 
function can be employed. An example of this approach would 
be to mutate fibroblasts, for example, with the present 
vectors and screen for hits that allow growth in soft agar. 
In this way genes encoding tumor suppressors could be 
25 identified. Although only 1 of 2 alleles will typically be 
trapped, the genome of cells in culture is often unstable 
and, through selection, events can be found in which the 
second allele is lost. This makes it possible to also screen 
for recessive phenotypes. 

30 

5.5.2. Function-Based Gene Discovery 

The gene activation capabilities of the presently 
described vectors have further application for selective gene 
discovery. For example, proliferation deficient cells 
35 (e.g., tumor suppressor or DNA repair knockout cells, etc.) 

can be infected with the presently described gene activation 
vectors. The infected cells can subsequently be screened for 
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cells/colonies that display a partially or fully corrected 
proliferation phenotype. When cells displaying the corrected 
phenotype are identified, the "activated" genes resppnsible 
for correcting the proliferation deficient phenotype can be 
5 rapidly identified by DNA sequencing using, for example, 3 f 
RACE. Typically, genes that partially or fully correct a DNA 
repair mutation (mutations often associated with cancer in 
animals and humans) , are more likely to encode a tumor 
suppressor, or possibly oncogene, activity (see generally, 
10 Selten et al., 1985, EMBO J., 4 (7) : 1793-1798) . 

Conversely, cancerous or transformed cells (or cell 
lines) can be infected with the described gene activation 
vectors and subsequently subject to various cytotoxic agents 
that are toxic to growing, or rapidly growing, cells (see 

15 generally Wilson et al., 1986, Cell, 44:477-487; Stephenson 
et al., 1973, J. Virol., 11:218-222; Sacks et al . , 1979, 
Virology, 57:231-240; Inoue et al . , 1983, Virology 125:242- 
245; Norton et al., 1984, J. Virol., 50:439-444; Cho et al . , 
1976, Science, 194:951-953; Steinberg et al . , 1978, Cell 

20 13:19-32; Maruyama et al . , 1981, J. Virol., 37:1028-1043; 
Varmus et al . , 1981, Cell, 25:23-26; Varmus et al . , 1981, 
Virology, 108:28-46; Mathey-Prevot et al . , 1984, J. Virol., 
50:325-334; and Ryan et al . , 1985, Mol . Cell. Biol . , 5:3477- 
3582) . Preferably, the infected cells are exposed to the 

25 cytotoxic or chemotherapeutic agents under conditions where 
cells that have reverted to a non- transformed phenotype are 
contact inhibited, and are less susceptible to cytotoxic 
agents present in the culture medium. This further 
contributes to the preferential elimination of rapidly 

30 growing or transformed cells and, after several cycles, the 
eventual isolation of cells that have partially or fully . 
reverted to the noncancerous or nontransf ormed phenotype. 
The "activated" genes responsible for correcting the 
transformed phenotype, or suppressing the tumorigenic 
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phenotype, can subsequently be rapidly identified by DNA 
sequencing using the described 3* RACE protocols. 

The presently described methods are also useful for 
identifying the genetic basis of cancer. Cancers that may be 
5 studied, and potentially corrected, using the presently 

described methods include, but are not limited to; Cardiac: 
sarcoma ( angiosarcoma , f ibrosarcoma , rhabdomyosarcoma , 
liposarcoma) , myxoma, rhabdomyoma, fibroma, lipoma and 
teratoma; Lung: bronchogenic carcinoma (squamous cell, 

10 undifferentiated small cell, undifferentiated large cell, 

adenocarcinoma) , alveolar (bronchiolar) carcinoma, bronchial 
adenoma, sarcoma, lymphoma, chondromatous hamartoma, 
mesothelioma; Gastrointestinal: esophagus (squamous cell 
carcinoma, adenocarcinoma, leiomyosarcoma, lymphoma) , stomach 

15 (carcinoma, lymphoma, leiomyosarcoma) , pancreas (ductal 
adenocarcinoma, insulinoma, glucagonoma, gastrinoma, 
carcinoid tumors, vipoma), small bowel (adenocarcinoma, 
lymphoma, carcinoid tumors, Karposi f s sarcoma, leiomyoma, 
hemangioma, lipoma, neurofibroma, fibroma) , large bowel 

20 (adenocarcinoma, tubular adenoma, villous adenoma, hamartoma, 
leiomyoma); Genitourinary tract: kidney (adenocarcinoma, 
Wilm's tumor [nephroblastoma], lymphoma, leukemia), bladder 
and urethra (squamous cell carcinoma, transitional cell 
carcinoma, adenocarcinoma) , prostate (adenocarcinoma, 

25 sarcoma) , testis (seminoma, teratoma, embryonal carcinoma, 

teratocarcinoma, choriocarcinoma, sarcoma, interstitial cell 
carcinoma, fibroma, fibroadenoma, adenomatoid tumors, 
lipoma); Liver: hepatoma (hepatocellular carcinoma), 
cholangiocarcinoma , hepatoblastoma , angiosarcoma , 

30 hepatocellular adenoma, hemangioma; Bone: osteogenic sarcoma 
(osteosarcoma) , fibrosarcoma, malignant fibrous histiocytoma, 
chondrosarcoma, Ewing's sarcoma, malignant lymphoma 
(reticulum cell sarcoma), multiple myeloma, malignant giant 
cell tumor, chordoma, osteochronf roma (osteocartilaginous 

35 exostoses) , benign chondroma, chondroblastoma, 

chondromyxof ibroma, osteoid osteoma and giant cell tumors; 
Nervous system: skull (osteoma, hemangioma, granuloma, 
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xanthoma, osteitis deformans) , meninges (meningioma, 
meningiosarcoma, gliomatosis) , brain (astrocytoma, 
medulloblastoma, glioma, ependymoma, germinoma [pine;aloma] , 
glioblastoma multiforme, oligodendroglioma, schwannoma, 
5 retinoblastoma, congenital tumors) , spinal cord 

(neurofibroma, meningioma, glioma, sarcoma); Gynecological: 
uterus (endometrial carcinoma) , cervix (cervical carcinoma, 
pre-tumor cervical dysplasia) , ovaries (ovarian carcinoma 
[serous cystadenocarcinoma, mucinous cystadenocarcinoma, 

10 endometrioid tumors, celioblastoma, clear cell carcinoma, 
unclassified carcinoma] , granulosa- thecal cell tumors, 
Sertoli -Leydig cell tumors, dysgerminoma , malignant 
teratoma) , vulva (squamous cell carcinoma, intraepithelial 
carcinoma, adenocarcinoma, fibrosarcoma, melanoma), vagina 

15 (clear cell carcinoma, squamous cell carcinoma, botryoid 
sarcoma [embryonal rhabdomyosarcoma] , fallopian tubes 
(carcinoma); Hematologic: blood (myeloid leukemia [acute and 
chronic] , acute lymphoblastic leukemia, chronic lymphocytic 
leukemia, myeloproliferative diseases, multiple myeloma, 

20 myelodysplastic syndrome), Hodgkin's disease, non-Hodgkin ' s 
lymphoma [malignant lymphoma] ; Skin: malignant melanoma, 
basal cell carcinoma, squamous cell carcinoma, Karposi's 
sarcoma, moles, dysplastic nevi, lipoma, angioma, 
dermatofibroma, keloids, psoriasis; Breast: carcinoma and 

25 sarcoma, and Adrenal glands: neuroblastoma. 

Modifications to the above studies include the use of 
retroviral gene trapping vectors in conjunction with a 
chimeric integrase that targets, or biases, retroviral 
integration to genes regulated by specific control sequences 

30 or transcription factors. For example, the presently 

described retroviral gene activation vectors can be packaged 
into a virus incorporating a p53 -chimeric integrase (as 
described in U.S. Applic. Ser. No. 08/907,598) that 
preferentially targets vector-mediated gene activation to 

35 genes regulated by this known tumor suppressor activity. 

Appropriately modified, the presently described vectors 
additionally provide a vehicle for placing virtually any DNA 
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sequence throughout the target cell genome and rapidly 
identifying where the vectors have integrated. A growing 
number of DNA sequences have been identified that one might 
wish to place throughout the genome. Examples of such 
5 sequences include recombination sites such as frt sites or 
lox P sites respectively identified by flp and ere 
recombinases . Although these sites can be placed throughout 
the genome by homologous recombination or other 
transformation methods, the present invention allows for the 

10 rapid identification and cataloging of the integration sites 
using automated processes. These recombination sites can be 
used for specific DNA insertion or, along with insertions in 
other positions, and they can be used to create chromosomal 
rearrangements such as inversions, deletions and 

15 translocations. Thus the presently described vectors are 
particularly useful for studying gene function through 
chromosomal rearrangements. Other sequences one might wish 
to place throughout the genome include, but are not limited 
to, tet, ecdysone, or estrogen receptor DNA binding sites or 

20 response elements. These sites are commonly used for 

inducing or repressing gene expression and by placing these 
sites throughout the genome, preferably in tens of thousands 
of different genes, will provide an opportunity to create 
conditional or tissue-specific regulation of gene expression. 

25 An additional feature of the described mutagenesis 

strategy is that vector encoded sequences and structural 
features can be exploited to allow the rapid identification 
of genomic DNA directly flanking the integrated gene trap 
constructs. This approach exploits the fact that exon 

30 sequence identifying the gene into which the construct has 
integrated is accessible via the sequence acquisition 
capabilities of the 3' gene trap cassette. Oligonucleotides 
that hybridize to suitably identified (by bioinf ormatics) 
cellular exons can be used in conjunction with 

35 oligonucleotides that hybridize to vector encoded sequence in 
PCR reactions that produce templates that can be cloned, or 
directly sequenced to identify the integration site. Where 
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PCR might not prove wholly suitable, PCR reactions can be 
augmented by using vectors that have been engineered to 
incorporate a relatively rare cutter restriction sit;e, e.g., 

Sfi I, etc. Such restriction sites can be exploited to 

5 subclone the PCR products, or even genomic sequence flanking 
the vector, into suitable cloning vectors, or libraries 
thereof, that can subsequently be used to, for example, 
identify vector integration sites using established methods, 
e.g., PCR, long-range PCR, cycle sequencing, etc. 

10 Another aspect of the present invention places a gene 

encoding a recombinase activity (e.g., flp or ere, etc., see 
U.S. Patents Nos. 5,654,182 and 4,959,317 herein incorporated 
by reference) into the vector containing the described 3' 
gene trap cassette. The recombinase gene can be expressed in 

15 a manner similar to that described for the marker genes, 

supra. In brief, the recombinase can be expressed from an 
independent expression cassette, can be incorporated into a 
5 ! gene trap, or can be expressed from a vector promoter. 
Depending on the strategy employed to express the 

20 recombinase, it can be present on a separate construct, or in 
the vector either 5' or 3' from the 3' gene trap cassette. 
By incorporating the recombinase gene into the described gene 
trap vectors, a collection, or library, of mutated cells can 
be obtained that express the recombinase in essentially the 

25 same pattern as the various trapped genes. The above 

discussion describes just a few examples of how the presently 
described vectors can be used to place any DNA sequence 
throughout the genome in a manner that allows for the rapid 
identification of where the vectors have integrated into the 

30 target cell genome. Those skilled in the art will appreciate 
that the described vectors constitute technology of broad 
applicability to the field of eukaryotic molecular genetics. 
As such any of a wide variety of vectors and genetic 
applications are contemplated as within the scope of the 

35 present disclosure. For example, retroviral vectors can be 
designed that contain a 3 1 gene trap cassette without the 
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other described features. Additionally, 3 1 gene traps can be 
designed with tandem promoters where the one of the promoters 
is inducible. Alternatively, hybrid gene traps are also 
contemplated where, for example, the SAneo from the described 
5 5' gene trap had been fused, preferably in-frame, to the exon 
of the described 3' gene trap cassette (i.e., deleting the pA 
and promoter sequences) . Such a construct takes advantage 
both the enhanced SA and SD functions of the described gene 
trap, cassettes, and allows for the automated identification 
10 of the genes expressed in a given target cell. 

5.5.3. Conditional Mutagenesis 

Another aspect of the present invention is the 
ability to produce mutations that can be switched on and off 

15 temporally and spatially in cells or in an organism or 

animal. The ability to mutate a gene only in a specific 
place or at a specific time has important implications for 
understanding gene function. For example, the orientation of 
SAPgeo within an intron regulates its ability to trap, and 

20 thus mutate, the normal transcript produced by the trapped 

gene. Suitably oriented frt recombinase sites can be used in 
conjunction with flp recombinase to effect the above genome 
rearrangements (i.e., "flip", or even remove, the gene trap 
cassette and thus turn the mutation "on" or "off"). 

25 Alternatively, the cre/lox system, for example, can also be 
employed to produce conditional mutations where a given 
mutagenic construct can be selectively modified (replaced, 
flipped, deleted, etc.) only in tissues or cells expressing 
the ere recombinase. 

30 To validate the above concept, a vector was constructed 

that placed the SAPgeo cassette within two inverted lox 
sites. These sites are recognized by the ere recombinase 
which can effectively flip DNA sequences located in between 
the lox sites. A retroviral vector containing SApgeo flanked 

35 by inverted lox sites was integrated into an intron of the 
HPRT gene by homologous recombination. When SAPgeo was 
present in the forward orientation, HPRT function was 
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abolished as demonstrated by survival of cells in the 
presence of 6-thioguanine . However, when ere recombinase was 
expressed in these cells, the orientation of SApgeo ytas 
flipped to the reverse orientation and HPRT function was 
5 regained as demonstrated by growth of cells in HAT containing 
medium. Thus, the HPRT gene was effectively switched off or 
on by flipping the orientation of SAPgeo. Accordingly, an 
additional- embodiment of the present invention is drawn to 
vectors that enable the selective and reversible modulation 

10 of gene expression. Using a similar methodology, gene trap 
mutations can also be made conditional or tissue-specific by 
linking recombinase expression, and hence the flipping of 
SApgeo, for example, to various stimuli/control elements. It 
is also possible to engineer an allelic series using a 

15 recombinase -mediated strategy to "swap" in or out, i.e., or 
engineer, any of a variety of more or less mutagenic 
constructs (appropriately flanked by lox or frt sites) . 

An alternative strategy for using the presently 
described vectors for tissue-specific or regulatable 

20 expression is to place specific DNA binding sites such as frt 
or lox sites within the LTRs. With lox sites in the LTRs, 
once an insertion is made and identified, the ere 
recombinase, for example, can be added and used to remove the 
entire insert except for one LTR containing a single frt or 

25 lox site. Additionally, a DNA response element that allows 

regulatable gene expression can be incorporated, wholly or in 
part, in conjunction with the recombinase sites. When the 
vector or gene trap insert is removed by the recombinase 
activity, the same recombination event that results in the 

30 production of the single LTR will also produce a functional 
DNA response element. This single LTR does not interfere 
with gene function, but the DNA element can be used to 
modulate gene expression. Typical DNA elements or operators 
used for modulating eukaryotic gene expression include the 

35 tet, ecdysone or estrogen DNA binding sites. The presence of 
the tet operator in combination with the tet repressor 
protein would allow the expression of the gene to be 
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modulated up and down. This can be carried out in mice by 
breeding the line of mice carrying the LTR insertion with 
lines of mice expressing the tet repressor either 
ubiquitously or only in specific tissues. 
5 Another embodiment of the present invention is based on 

the fact that the flp recombinase, for example, can mediate 
the replacement of frt flanked integrated vector sequences 
with exogenously added frt flanked sequences. Accordingly, 
once a suitably constructed vector (incorporating flanking 
10 recombinase sites) is incorporated into a given region of the 
target cell genome, virtually any of a wide variety of DNA 
sequences (i.e., promoters , enhancers , IRES , response 

elements, etc.) that also incorporate the same flanking 
recombinase sites can be exchanged into or out of the vector 
15 by employing the proper recombinase protein ♦ 

5.5.4* Biological Assays 

As is evident, vectors, particularly retroviral 
vectors, incorporating the presently described 3' gene trap 

20 cassette can be used to mutagenize, activate, or control the 
expression of endogenous genes in a wide variety of 
eukaryotic target cells. Accordingly, the presently 
described vectors are particularly useful to practice 
molecular genetic techniques in plants as well as higher 

25 eukaryotes such as birds, fish, and mammals. Examples of 

such molecular genetic techniques include both in vitro and 

in vivo screens for gene activation, mutation, and 

regulation. 

For example, CD4 positive human T cells can be infected 
30 with the presently described vectors in vitro, and 

subsequently infected with a cytopathic strain of human 
immunodeficiency virus (HIV) . Cells that are capable of 
surviving HIV infection, can be isolated and rapidly screened 
for genetic mutations that are associated with HIV 
35 resistance. 
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Another screening strategy that can be employed in vitro 
is mutating transformed cells with the described gene trap 
vectors and selecting for mutations that prevent rapid 
proliferation of the transformed cells. This strategy can be 
5 used to identify oncogenes or tumor suppressor genes. After 
mutation of the cells, various chemicals can be used to kill 
cells that divide rapidly in order. to select for insertions 
in genes that play a role in cell proliferation and the 
transformed phenotype. One example of a chemical that kills 

10 rapidly proliferating cells is bromodeoxyuridine (BrdU) , 
Pestov and Lau, 1994/ Proc. Natl. Acad. Sci., USA, 
91 (26) : 12549-12553 . BrdU preferentially intercalates into 
the DNA of rapidly dividing cells and, after the addition of 
Hoechst 33258, treatment with fluorescent light negatively 

15 selects against rapidly dividing cells while simultaneously 
selecting for slow growing cells. 

Another application of cells transduced with the 
described vectors is cell based in vitro phenotypic screens 
that can be conducted using heterozygous cells, or using 

20 cells that have been cultured or manipulated to homozygosity 
(using, for example, high concentrations of antibiotics to 
select for homozygous representation of the corresponding 
selectable marker gene incorporated into an applicable gene 
trap vector) prior to such screening assays. 

25 An in vivo assay contemplated by the present invention 

includes the application of vectors employing the 3 1 gene 
trap cassette to mutagenize and screen animals in vivo. In 
these assays, the present vectors are used in place of, or in 
addition to classical chemical mutagens such as, for example, 

30 ENU (see generally, Vitaterna et al . , 1994, Science, 264:719- 
725) . For example, test animals can be infected in various 
locations, and with varying concentrations of the presently 
described viral vectors. Preferable modes of administration 
include oral, intranasal, rectal, topical, intraperitoneal, 

35 intravenous, intramuscular, subcutaneous, subdermal, 

intracranial, intrathecal, and the like. The aberrant 
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cellular phenotypes resulting from such mutagenic stimuli can 
then be identified, isolated, and screened. Where tumor 
cells are observed and isolated, 3 ! RACE can be used to 
rapidly identify the mutation associated with the tumorigenic 
5 phenotype, and thus identify a candidate tumor suppressor 
gene or potential oncogene. 

An additional in vivo application of the presently 

described vectors involves the generation of mutant 
transgenic, and somatic transgenic, cells, animals, and 

10 plants that are abnormally resistant or susceptible to 

infection by pathogens associated with infectious diseases. 

Another powerful application of the present invention is 
the large scale production of mutant nonhuman transgenic 
animals. Such nonhuman transgenic animals may include, for 

15 example, transgenic pigs, transgenic rats, transgenic 

rabbits, transgenic cattle, transgenic goats, and other 
transgenic animal species such as birds and fish, 
particularly miammalian species, known in the art. 
Additionally, bovine, ovine, and porcine species, other 

20 members of the rodent family, e.g., rat, as well as rabbit 

and guinea pig and non-human primates, such as chimpanzee, 
may be used to practice the present invention. Particularly 
preferred animals are rats, rabbits, guinea pigs, and most 
preferably mice. Both somatic cell transgenic animals (see 
25 above) , and germ line transgenic animals are specifically 
contemplated. Additionally, such animals are a source of 
tissues and cells for further gene trapping studies using 
cultured cells* 

The production of mutations in mouse embryonic stem 
3 0 cells by homologous recombination is well established and has 
proven useful for studying gene function in a mammalian 
system. However, homologous gene targeting suffers from a 
number of limitations. One such limitation is the need for a 
gene to be both known and mapped in order to determine 
. 35 exon/intron structure of the genomic sequence. Even when a 
gene and its structure are known, a targeting vector must be 
made for each individual gene one wishes to mutate. This 
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limits the speed at which large numbers of genes can be 
mutated by homologous recombination. The presently described 
methods of non-homologous, or nonspecific, 3' gene trapping 
and mutation do not suffer from the above limitations. 
5 Generally, nonspecif ically inserted, or nontargeted, vectors 
can be distinguished from vectors designed for homologous 
recombination by the fact that such vectors lack the (often 
extensive) flanking regions of homologous targeting sequence 
typical of DNA vectors designed to insert sequence by 

10 homologous recombination (see, for example, U.S. Patent No. 
5,733,761 herein incorporated by reference). 

Other methods can be used to create mutations in mice. 
These include chemical or radiation induced mutations which 
can be used to mutate genes without any prior knowledge of 

15 the gene. These mutations can be made on a large scale but 
often require lengthy and involved processes to identify the 
mutated genes by, for example, positional cloning. 
Additionally, these mutations are identified only after large 
numbers of mice are screened for phenotypes. This 

20 necessitates a large mouse colony, the great expense of 
maintaining this colony, and time for breeding animals. 
Methods are required that allow the rapid mutation of genes 
regardless of prior knowledge of the gene and allow the gene 
to be easily identified. Gene trapping as described in the 

25 present invention confers the ability to mutate large numbers 
of genes and to allow the (almost) simultaneous 
identification of the mutations while still in the embryonic 
stem cell stage. This allows for substantial analysis before 
without incurring the costs of large scale mouse production, 

30 and, as discussed supra, provides a powerful gene discovery 
component. Mice can subsequently be produced from ES cells 
containing gene trap mutations in the genes selected, and the 
resulting phenotypes can be rapidly identified and 
characterized. The resulting knockout mice can subsequently 

35 be bred with other mouse strains, and, back crossed to 

produce congenic or recombinant congenic animals that allow 
for the evaluation of the gene trap mutation in different 
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genetic backgrounds. A representative listing of various 
strains and genetic manipulations that can be used to 
practice the above aspects of the present invention 
(including the ES cell libraries) is provided in "Genetic 
5 Variants and Strains of the Laboratory Mouse" 3rd Ed., Vols. 
1 and 2, 1996, Lyon et al . , eds . , Oxford University Press, 
NY, NY, herein incorporated by reference in its entirety. 

Given* that altered cellular phenotypes can be associated 
with the presently described methods of gene trapping and 

10 activation, additional aspects of the invention are the use 
of screening assays to detect altered cellular and animal 
phenotypes. Altered phenotypes can also be detected upon 
exposing the mutated cells and animals to exogenous materials 
and compounds. Additionally, the genes/proteins associated 

15 with the mutant phenotypes can be isolated and subject to 

further biochemical analysis to identify drug candidates that 
can alter, replace, interact with, inhibit, or augment the 
normal function of the protein. 

The present invention is further illustrated by the 

20 following examples, which are not intended to be limiting in 
any way whatsoever. 

6.0. EXAMPLES 

When vectors containing both SA(3geo (as a 5' exon trap) 
25 and PGKpuroSD (as a 3 1 exon trap) were tested, it was found 
that 13 times as many G418 resistant colonies were obtained 
as compared to puro resistant colonies. This indicated that, 
in many cases, when SAfJgeo trapped , a gene, the puro SD 
portion of the gene trap vector was unable to effectively 
30 trap the 3 1 portions of the same gene (as evidenced by the 
failure to confer puromycin resistance to the target cell) . 
In addition, when the G418 resistant colonies were isolated 
and subjected to 3 ! RACE to determine whether puro was 
splicing into downstream exons but not at sufficiently high 
35 levels to provide puro selection, it was found that only 
about 10% of the colonies yielded a 3 1 RACE product. 
Moreover, the sequence data indicated that splicing was not 
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occurring in the majority of cases. These data indicated 
that the PGKpuroSD 3' gene trap cassette could only splice 
into and trap downstream exons of genes with limited 
efficiency. Similar inefficiencies have also been observed 
5 using a variety of other selectable markers in addition to 
puro. This could be due to the fact that most selectable 
markers are derived from microorganisms. For example, the 
puro gene was derived from Streptomyces alboniger and 

therefore incorporates a codon usage that is distinct from 

10 that typically used by mammalian cells. 

In order to test whether codon usage was responsible for 
the observed inefficiency in splicing, a puro gene was 
synthesized that incorporated an optimal mammalian codon 
usage. However, 3' gene trap cassettes that incorporated the 

15 modified puro exon were not efficiently spliced. Another 

possible reason for inadequate splicing is that the puromycin 
marker is 700 bp long whereas the average length of a first 
exon is only about 100 bp. Thus, it further remained 
possible that placing a selectable marker gene next to a 

20 promoter hindered the optimal recognition of the puro exon 
and splice donor sequence by the splicing machinery. 

Given the important discovery that the cellular RNA 
splicing machinery could only process the puro gene exon with 
limited efficiency, it was reasoned that 3' gene trap 

25 cassettes incorporating naturally, occurring mammalian exons 
might exhibit markedly enhanced splicing, and hence trapping, 
efficiencies. To test this hypothesis, a 3 1 gene trap 
cassette was engineered that replaced the puro exon and 
splice donor site with a naturally occurring mouse exon with 

30 a native splice donor sequence as well as a portion of the 
naturally occurring intronic sequence following the splice 
donor site (the first exon of the mouse btk gene, nucleotides 
40,043 to 40,250 of GenBank accession number MMU58105) . This 
cassette was subsequently inserted 3 ' to the SApgeo gene in a 

35 viral gene trap vector. The first exon of the mouse btk gene 
was selected because it is about the size of an average 
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mammalian first exon and, importantly, it had previously been 
determined that, although it naturally occurs in the murine 
genome, the btk gene is not expressed in murine ES cells. 
This feature is important because if it were expressed in ES 
5 cells, the 3 ! RACE product would always be contaminated with 
btk sequence from the endogenous gene and might hinder the 
ability to identify the trapped genes. Consequently, a 
preferred feature of the 3 1 gene trap cassette exon is that 
it is derived from a naturally occurring gene that is not 

10 normally expressed by the target cell, or not expressed 
absent external stimulus or manipulation. 

Exons that can be incorporated into the presently 
described 3 • gene trap cassette can be taken or derived from 
sequences that naturally occur in any of a wide variety of 

15 eukaryotic cells (e.g., yeast, insect, fungi, plants, birds, 
reptiles, fish, etc.), although animal cells, specifically 
mammalian cells, are typically preferred. Alternatively, 
exons can be designed and synthesized (e.g., "consensus" 
exons) such that they can be efficiently and functionally 

20 processed by the mRNA processing machinery of the eukaryotic 
target cell (e.g., splicing, capping, polyadenylation, 
transport, and degradation) . 

Although the first exon of btk has been specifically 
exemplified herein, the present invention is not limited to 

25 this exon. Virtually any naturally occurring exon of an 

eukaryotic gene, series of exons from one or more eukaryotic 
genes, consensus exon, or synthetic exon or exons that are 
readily recognized and efficiently processed by the target 
cell RNA processing and expression machinery can be 

30 incorporated into the presently described 3' gene trap 

cassette. Typically, the first exons are less than about 
1,000 bp in length, more preferably less than about 700 bp, 
and more preferably less than about 500 bp, and most 
preferably less than about 3 00 bp in length. Examples of 

35 such first exons can be found in, for example, GenBank, and 
include, but are not limited to, the first exons from human 
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growth hormone, erythropoietin, hprt, metallothionein I and 
II, maize, wheat, or soybean ribulose 1 , 5-bisphosphate 
carboxylate, rat preproinsulin, male sterility 2 (MS 2) gene, 
prolifera (PRL) gene, etc. 
5 Given that typical antibiotic resistance markers are not 

native to animal or mammalian cells, markers that confer 
antibiotic resistance or sensitivity (Herpes thymidine 
kinase) to mammalian target cells are generally not preferred 
for incorporation into the presently described 3 1 gene trap 

10 cassettes. Similarly, given that typically available 

enzymatic markers that might be used in chromogenic assays 
for the detection and selection of gene trap events (such as 
3-galactosidase, horse radish peroxidase, bacterial alkaline 
phosphatase, etc.) are also not native to the mammalian 

15 genome, such genes are not preferred for the practice of the 
present invention. However, if suitable genetic 
manipulations were found that increase the efficiency with 
which transcripts encoding the above selectable and enzymatic 
markers are processed and expressed by mammalian cells, such 

20 markers could be used to practice the claimed invention. 

Although the above selectable markers and enzymatic reporters 
are preferably not part of the presently described 3 f gene 
trap cassette, they can be used as part of the 5 1 gene trap 
component in combination with the described 3 1 gene trap 

25 cassette. 

6,1. Vector Construction 

The promoter from the mouse phosphoglycerate kinase 
(PGK) gene was placed upstream from the first exon of the 

30 naturally occurring murine btk gene (nucleotides 40,043 to 
40,250 of the murine Jbtk gene) . The first exon of the Jbtic 
gene does not contain a translational start site and 
initiation codon marking the 5 1 region of the coding 
sequence; however, these features could be engineered into 

35 the exon if desired. The 3 1 end of the coding region of the 
first exon is marked by a splice donor sequence. Given that 
splice donor recognition sequences can extend into intronic 
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sequence, 103 bases of intron DNA was retained after the end 
of the btk first exon. The PGKJbt&SD cassette lacks a 3 1 
polyadenylation signal. Accordingly, any transcript produced 
by the cassette cannot be properly processed, and therefore 
5 identified by 3 1 RACE, unless the transcript is spliced to a 
3' exon that can be polyadenylated. 

The above 3 • gene trap cassette was placed into a 
retroviral vector (in reverse orientation relative to the 
flanjcing LTR regions) that incorporated a polyadenylation 
10 site 5 f to the PGK promoter of the 3' gene trap cassette, the 
neo gene was placed 5 1 to the polyadenylation site, and a 
splice acceptor (SA) site was placed 5» to the neo coding 
region to produce a functional SAneopA, or optionally a 
SAIRESneopA 5» gene trap cassette. This vector also 
15 incorporates, in operable combination, a pair of recombinase 
recognition sites that flank the PGKjbtJcSD cassette (See 
Figure 2) . This vector typically requires that the target 
cell naturally express the trapped gene; however, this 
requirement can be overcome by adding a promoter that 
20 independently controls the expression of the selectable 
marker . 

6.2. 3 1 Gene Trapping 

The btk vector was introduced into the embryonic 
stem cells using standard techniques. In brief, supernatant 
from GP + E packaging cells was added to approximately 2xl0 6 
embryonic stem cells (at an input ratio of approximately 0.1 
virus/target cell) for 16 hours and the cells were 
subsequently selected with G418 for 10 days. G418 resistant 
cells were subsequently isolated, grown up on 96 -well plates 
and subjected to automated RNA isolation, reverse 
transcription, PCR and sequencing protocols to obtain the 
gene trapped sequences. 

RNA Isolation was carried out on DNA bind plates 
(Corning/Costar) treated with 5' -amino (dT) 42 (GenoSys 
Biotechnologies) in a 50mM Sodium Phosphate buffer, pH 8.6, 
and allowed to sit at room temperature overnight. 
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Immediately prior to use the plates were rinsed three times 
with PBS and twice with TE. Cells were rinsed with PBS, 
lysed with a solution containing lOOmM Tris-HCl, 500jnM LiCl, 
lOmM EDTA, 1% LiDS, and 5mM DTT in DEPC water, and 
5 transferred to the DNA binding plate where the mRNA was 

captured. After a 15 minute incubation the RNA was washed 
twice with a solution containing lOmM Tris-HCl, 150mM LiCl, 
lmM EDTA, and 0.1% LiDS in DEPC water. The RNA was then 
rinsed three times with the same solution minus LiDS. 

10 Elution buffer containing 2mM EDTA in DEPC water was added 
and the plate was heated at 70° C for five minutes. An RT 
premix containing 2X First Strand buffer, lOOmM Tris-HCl, pH 
8.3, 150mM KC1, 6mM MgCl 2 , 2mM dNTPs, RNAGuard (1.5 
units/reaction, Pharmacia) , 20mM DTT, QT primer (3pmol/rxn, 

15 GenoSys Biotechnologies, sequence: 5' CCAGTGAGCAGAGTGACGAGG 
ACTCGAGCTCAAGCTTTTTTTTTTTTTTTTT 3', SEQ ID NO: 12) and 
Superscript II enzyme (200 units/rxn, Life Technologies) was 
added. The plate was transferred to a thermal cycler for the 
RT reaction (37° C for 5 min. 42° C for 30 min. and 55° C for 

20 10 min) . 

6.2.1. PCR Product Generation 

The cDNA was amplified using two rounds of 
PCR. The PCR premix contains: 1 . IX MGBII buffer (74 mM Tris 

25 pH 8.8, 18.3mM Ammonium Sulfate, 7.4mM MgCl 2 , 5 . 5mM 2 ME, 

0.011% Gelatin), 11.1% DMSO (Sigma), 1.67mM dNTPS, Taq (5 
units/rxn), water and primers. The sequences of the first 
round primers are: P 0 5 1 AAGCCCGGTGCCTGACTAGCTAG3 1 , SEQ ID 
NO: 13; BTK C 5 1 GAATATGTCTCCAGGTCCAGAG3 1 , SEQ ID NO: 14; and Q 0 

30 5 1 CCAGTGAGCAGAGTGACGAGGAC3 1 , SEQ ID NO: 15 (pmol/rxn) . The 
sequences of the second round primers are Pi 
5 1 CTAGCTAGGGAGCTCGTC3 1 , SEQ ID NO: 16; BTKi 
5 1 CCAGAGTCTTCAGAGATCAAGTC3 1 , SEQ ID NO: 17; and Q A 
5 1 GAGGACTCGAGCTCAAGC3 1 , SEQ ID NO: 18 (50pmol/rxn) . The outer 

35 premix was added to an aliquot of cDNA and run for 17 cycles 
(95° C for 1 min. 94° C for 30 sec, 58° C for 30 sec 65° C 
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for 3.5 min) . An aliquot of this product was added to the 
inner premix and cycled at the same temperatures 40 times. 

The nested 3' RACE products were purified in a 96-well 
microtiter plate format using a two-step protocol as follows. 
5 Twenty- five microliters of each PCR product was applied to a 
0.25 ml bed of Sephacryl® S-3 00 (Pharmacia Biotech AB, 
Uppsala, Sweden) that was previously equilibrated with STE 
buffer (150 mM NaCl, 10 MM Tris-HCL, ImM EDTA, pH 8.0). The 
products were recovered by centrif ugation at 1200 x g for 5 

10 minutes. This step removes unincorporated nucleotides, 

oligonucleotides, and primer-dimers . Next, the products were 
applied to a 0.25 ml bed of Sephadex® G-50 (DNA Grade, 
Pharmacia Biotech AB) that was equilibrated in MilliQ H 2 0, and 
recovered by centrif ugation as described earlier. Purified 

15 PCR products were quantified by fluorescence using PicoGreen 
(Molecular Probes, Inc., Eugene Oregon) as per the 
manufacturer • s instructions . 

Dye terminator cycle sequencing reaction with AmpliTaq® 
FS DNA polymerase (Perkin Elmer Applied Biosystems, Foster 

20 City, CA) were carried out using 7 pmoles of primer 

(Oligonucleotide OBS; 5 1 CTGTAAAACGACGGCCAGTC3 1 , SEQ ID NO: 19) 
and approximately 30-120 ng of 3 ! RACE product. The cycling 
profile was 35 cycles of 95° C for 10 sec, 55° C for 30 sec, 
and 60° C for 2 min. Unincorporated dye terminators were 

25 removed from the completed sequencing reactions using G-50 

columns as described earlier. The reactions were dried under 
vacuum, resuspending in loading buffer, and elect rophoresed 
through a 6% Long Ranger acrylamide gel (FMC BioProducts, 
Rockland, ME) on an ABI Prism® 377 with XL upgrade as per the 

30 manufacturer 1 s instructions . 

The automated 96-well format was used to obtain 
sequence, and data was obtained from 70% of the colonies. 
Upon examination, the sequence from the first exon of btk was 
identified followed by the btk splice junction. The splice 

35 junction was followed by unique sequences from each separate 
gene trap event. These sequences averaged 500 bp in length 
and were of high quality often containing long open reading 
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frames. In addition 80% of these sequences can be matched 
using blast searches to sequences found in the GenBank 
database indicating that transcribed exonic sequences were 
identified. These gene trap sequence tags are of 
5 significantly better length and quality than those produced 
by previous gene trap designs. The new tags are improved in 
both length and quality and the fact that 80% of the tags 
match GenBank sequences suggests that they efficiently trap 
genes. 

10 These data indicate that the splicing machinery is 

better able to recognize an exon type sequence present 
adjacent to or relatively close to a promoter when splicing 
into downstream exons . These data also indicate that the 
majority of G418 resistant colonies can be identified using 

15 gene trap sequence tags. DNA sequence data had already been 
obtained that represents approximately 7,000 different genes 
trapped by a vector incorporating a PGKpuroSD 3 » gene trap, 
cassette in conjunction with puro selection. Given that it 
has already been established that such vectors typically 

20 produce 13 fold more G418 resistant colonies than puro 

colonies, vectors incorporating the presently described 3' 
gene trap cassette have a very large target size, probably 
well over 70,000 genes. This target can be further increased 
by using SAneopA rather than the SA(3geo fusion to increase 

25 the sensitivity of antibiotic selection, and any other 

selectable, or otherwise identifiable, marker could be used 
in the 5' gene trap cassette instead of neo. The use of 
IRESneo increased the number of G418 resistant colonies to 
over 15X the number of puro resistant colonies demonstrating 

30 its increased sensitivity. Other potential 5 1 trapping 
markers include, but are not limited to, antibiotic 
resistance genes (e.g., (3 - lactamase ) , colorimetric marker 
genes, genes encoding recombinase activity (e.g., flp or ere, 
etc.), enzymes, fluorescent marker genes (e.g., genes 

35 encoding activities that directly or indirectly mediate 
cellular fluorescence) such as the gene encoding green 
fluorescent protein, and assays for detecting the same, which 
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are described, inter alia, in U.S. Patent No. 5,625,048, 
herein incorporated by reference. 

Typically, the more sensitive the selectable marker, the 
greater the number of target genes that can bfe trapped. The 
ability to use the btk first exon to obtain gene trap 
sequence tags from the 3' exons of the G418 resistant 
colonies produced approximately 13 fold more mutated cells 
than could be mutated and rapidly sequenced using previous 
vectors, and thus represents a significant improvement in 
gene trapping technology. 

Given the above results, it is clear that the surprising 
and unexpected properties that resulted in an order of 
magnitude improvement over any previously reported 3 1 gene 
trap cassettes were only realized by departing from our 
established selectable marker paradigm for gene trapping. 

6.3. Pharmacoaenomi c s 

As discussed above, an additional method of 
augmenting the target size of the described vectors and 
20 constructs is to dispense with selection all together, and 

use other, i.e., molecular genetic, means to isolate trapped 
exons. Using such an approach allows for the rapid generation 
and analysis of gene sequence information. In addition to 
providing a clear advantage with respect to the speed of 
25 sequence acquisition, the sequencing of gene trapped 

libraries allows for substantial cost savings because of the 
reduced rate of repeat sequences relative to conventional 
cDNA libraries. The economies inherent in the presently 
described system of sequence acquisition make it practical to 
30 rapidly obtain a broad based survey of an individual's 

genome, or a collection of individuals' genomes, to identify, 
inter alia, genetic polymorphisms, particularly SNPs and 
cSNPs, that can be associated with the disease (where a 
portion of the individuals surveyed are known to manifest 
35 common disease traits or symptoms) . Additionally, similar 
methods can be employed in broad- based genomic assays that 
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identify the genetic basis for behavioral traits, drug 
susceptibility, drug sensitivity, drug allergy, etc. in both 
humans and non-human animals. 

In such methods, high-to-saturating concentrations of 
5 constructs comprising the described 3 1 gene trap cassette can 
be introduced into suitable target cells, including primary 
human or non-human cells (for example, primary nucleated 
blood cells such as leukocytes and lymphocytes, etc.), using 
established methods. After the 3 ! sequence acquisition 

10 cassette has integrated into the target cell genome, RNA is 
isolated from the target cells, cDNA is produced (and 
optionally PCR amplified as described above) , and a cDNA 
library is constructed. The library is subsequently 
sequenced and catalogued/compared relative to a control 

15 library as well as other "experimental 11 libraries. As SNPs, 
cSNPs, or other more gross polymorphisms are identified that 
correlate with the "experimental" or "disease" groups, a 
catalog of genetic polymorphisms will be developed that 
provides both a multi-loci analysis as well as highlights the 

20 regions of the genome that correlate with specific diseases, 
or may other wise warrant further study and analysis. Such 
information can also prove valuable for the identification of 
genetic polymorphisms associated with drug effectiveness (or 
adverse drug reactions) , as well as the design of diagnostic 

25 assays. 

7.0. Reference to Microorganism Deposits 

The following plasmid has been deposited at the American 
Type Culture Collection (ATCC) , Manassas, VA, USA, under the 

30 terms of the Budapest Treaty on the International Recognition 
of the Deposit of Microorganisms for the Purposes of Patent 
Procedure and Regulations thereunder (Budapest Treaty) and is 
thus maintained and made available according to the terms of 
the Budapest Treaty. Availability of such plasmid is not to 

35 be construed as a license to practice the invention in 

contravention of the rights granted under the authority of 
any government in accordance with its patent laws. 
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The deposited plasmid has been assigned the indicated 
ATCC deposit number: 



All publications and patents mentioned in the above 
specification are herein incorporated by reference. Various 
modifications and variations of the described invention will 
be apparent to those skilled in the art without departing 
from the scope and spirit of the invention. Although the 
invention has been described in connection with specific 
preferred embodiments, it should be understood that the 
invention as claimed should not be unduly limited to such 
specific embodiments. Indeed, various modifications of the 
above -described modes for carrying out the invention which 
are obvious to those skilled in the field of animal genetics 
and molecular biology or related fields are intended to be 
within the scope of the following claims. 



Plasmifl 
pbtK 



ATCC No. 



209712 
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MICROORGANISMS 

Optional Sheet in connection with the microorganism referred to on page 58 , lines 3-4 of the description ' 

A. IDENTIFICATION OF DEPOSIT ' 

Further deposits are identified on an additional sheet ' 

Name of depositary institution 4 
American Type Culture Collection 
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What is claimed is : 

1. A genetically engineered vector comprising: 

a) a 5 ! gene trap cassette comprising in operable 
5 combination: 

1) a splice acceptor; 

2) a first exon sequence located 3 ! to said splice 
acceptor, said first exon encoding a marker 
enabling the identification of a cell expressing 

10 said exon; and 

3) a polyadenylation sequence defining the 3 1 end 
of said first exon; 

b) a 3 1 gene trap cassette located 3' to said 
polyadenylation sequence and comprising in operable 

15 combination: 

1) a promoter; 

2) a second exon sequence located 3' from and 
expressed by said promoter, said second exon not 
encoding an activity conferring antibiotic 

20 resistance; 

3) a splice donor sequence defining the 3' region 
of the exon; and 

wherein said vector does not encode a promoter mediating the 
expression of said first exon, and wherein said vector does 
25 not encode a sequence that mediates the polyadenylation of an 
mRNA transcript encoded by said second exon sequence and 
expressed by said promoter. 

2. A vector according to Claim 1 wherein said first exon 
30 additionally encodes an internal ribosome entry site 

operatively positioned between said splice acceptor and the 
initiation codon of said first exon. 

3 . The vector of Claim 1 wherein said second exon and splice 
35 donor sequences are derived from a naturally occurring 

eukaryotic gene. 
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4. The vector of Claim 1 additionally incorporating a 
recombinase recognition sequence operatively positioned 
upstream from said splice acceptor site. 

5 5. The vector of Claim 4 additionally incorporating in the 
region between said polyadenylation sequence and said 
promoter a second recombinase recognition. 

6 . The vector of Claim 1 wherein said first exon encodes a 
10 marker selected from the group consisting of a marker 
conferring antibiotic resistance, a marker conferring 
antibiotic sensitivity, an enzymatic marker, a recombinase, 
and a fluorescent ly detectable marker. 

15 7. The vector of claim 6 wherein said marker encodes 
neomycin resistance. 

8. A genetically engineered retroviral vector comprising: 

a) a marker gene expressed by a first vector encoded 
promoter; and 

b) a 3 1 gene trap cassette comprising in operable 
combination: 

1) a second vector encoded promoter; 

2) an exon sequence located 3 1 from and expressed 
by said second promoter, said exon not encoding an 
activity conferring antibiotic resistance; 

3) a splice donor sequence defining the 3' region 
of the exon; and 

wherein said vector does not encode a sequence that mediates 
the polyadenylation of an mRNA transcript encoded by said 
exon sequence. 

9. A genetically engineered vector comprising: 
a) a 5 1 gene trap cassette comprising in operable 

35 combination: 

1) a splice acceptor; 



25 
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2) a first exon sequence located 3' to said splice 
acceptor, said first exon encoding a marker 
enabling the identification of a cell expressing 
said exon; and 

5 3) a polyadenylation sequence defining the 3' end 

of said first exon; 
b) a 3 1 gene trap cassette located 3 ■ to said 
polyadenylation sequence and comprising in operable 
combination: 
10 1) a promoter; 

2) a second exon sequence located 3 ! from and 
expressed by said promoter, said second exon being 
of non-prokaryotic origin; 

3) a splice donor sequence defining the 3' region 
15 of the exon; and 

wherein said vector does not encode a promoter mediating the 
expression of said first exon, and wherein said vector does 
not encode a sequence that mediates the polyadenylation of an 
mRNA transcript encoded by said second exon sequence and 
20 expressed by said promoter. 

10. An infectious retrovirus comprising a vector according 
to any one of Claims 1, 8 or 9 . 

25 11. A method of trapping a gene in a eukaryotic target cell 
comprising introducing a retrovirus according to Claim 10 
into said cell. 

12. A method of trapping a gene in a eukaryotic target cell 
comprising introducing a vector according to any one of 
Claims 1,8 or 9 into said cell; wherein said vector is 
introduced into said target cell by a method selected from 
the group consisting of electroporation, viral infection, 
retrotransposition, microinjection and transf ection. 

13. A eukaryotic cell which has a vector according to any 
one of Claims 1, 8 or 9 incorporated into its genome. 
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14. A non-human animal which has been genetically modified 
to incorporate a vector according to any one of Claims 1, 8 
or 9 into the genome of one or more cells of said animal. 

5 15. A method of activating the expression of a naturally 
occurring gene in a cell comprising introducing a vector 
according to any one of Claims 1, 8 or 9 into said cell. 

16. The method of Claim 15 wherein said cell is mammalian. 

10 

17. The method of Claim 16 wherein said mammalian cell is 
selected from the group consisting of a human cell and a 
mouse cell. 

15 18. A method of altering the expression of a cellularly 

encoded gene in a eukaryotic cell comprising introducing a 3 ' 
gene trap cassette into said cell, wherein said 3 1 gene trap 
cassette comprises in operable combination: 
1) a promoter; 

20 2) an exon sequence located 3 1 from and expressed 

by said promoter, said exon not encoding an 
activity conferring antibiotic resistance; and 
3) a splice donor sequence defining the 3' region 
of said exon 

25 wherein said cassette is non-homologously incorporated into 
the genome of a eukaryotic target, cell and said splice donor 
sequence of the transcript encoded by said exon is spliced to 
a splice acceptor sequence of said cellularly encoded gene. 

30 19. The method of Claim 18 wherein said non-homologously 

incorporated cassette is part of a retroviral vector that has 
nonspecif ically integrated into the genome of the eukaryotic 
target cell. 

35 20. The method of Claim 19 wherein said exon is selected 
from the group consisting of an exon not encoded by the 
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target cell genome and an exon not normally expressed by the 
target cell genome. 

21. A method for obtaining novel eucaryotic polynucleotide 
5 sequence information comprising: 

a) introducing into a eucaryotic cell a 3' gene trap 
cassette, said cassette comprising in operable combination: 

1) a promoter; 

2) an exon sequence located 3' from and expressed 
10 by said promoter, said exon not encoding an 

activity conferring antibiotic resistance; 

3) a splice donor sequence defining the 3» region 
of the exon; 

b) maintaining the cell under conditions allowing the 
15 nonspecific or nontargeted integration of the gene trap 

cassette into the genome of the cell; 

c) obtaining the chimeric transcript resulting from the 
splicing of said exon from said 3' gene trap cassette to a 
second exon encoded by the genome of said eucaryotic cell; 

20 and 

d) determining the polynucleotide sequence of said 
chimeric transcript . 
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SEQUENCE LISTING 
<110> Lexicon Genetics Incorporated 

<120> VECTORS FOR GENE MUTAGENESIS AND GENE 
DISCOVERY 

<130> 8535-0033-228 

<150> US 60/079, 729 
<151> 1998-03-27 

<150> US 60/081,727 
<151> 1998-04-14 

<150> US 09/057,328 
<151> 1998-04-08 

<160> 19 

<170> FastSEQ for Windows Version 3.0 

<210> 1 
<211> 43 
<212> DNA 

<213> Mus mus cuius 
<400> 1 

gcaaccagta acctctgccc tttctcctcc atgacaacca ggt 

<210> 2 

<211> 41 

<212> DNA 

<213> Adenovirus 

<400> 2 

gatgatgtca tacttatcct gtcccttttt tttccacagc t 

<210> 3 
<211> 35 
<212> DNA 

<213> Mus musculus 



<400> 3 

ggcggtcagg ctgccctctg ttcccattgc aggaa 

<210> 4 

<211> 42 

<212> DNA 

<213> Mus musculus 



35 



<400> 4 

tgtcagtctg tcatccttgc cccttcagcc gcccggatgg eg 
<210> 5 
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<211> 39 
<212> DNA 

<213> Mus musculus 
<400> 5 

tgctgacacc ccactgttcc ctgcaggacc gccttcaac 



<210> 6 
<211> 34 
<212> DNA 

<213> Miis musculus 
'<400> 6 

taattgtgta attattgttt ttcctccttt agat 

<210> 7 
<211> 40 
<212> DNA 

<213> Mus musculus 



34 



<400> 7 

cagaatcttc tttttaattc ctgattttat ttctatagga 

<210> 8 
<211> 37 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Synthetic sequence 
<400> 8 

tactaacatt gccttttcct ccttccctcc cacaggt 

<210> 9 
<211> 37 
<212> DNA 

<213> Mus musculus 
<400> 9 

tgctccactt tgaaacagct gtctttcttt tgcagat 

<210> 10 
<211> 36 
<212> DNA 

<213> Mus musculus 
<400> 10 

ctctctgcct attggtctat tttcccaccc ttaggc 

<210> 11 
<211> 35 
<212> DNA 

<213> Mus musculus 
<400> 11 
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attaattact ctgcccattc ctctctttca gagtt 

<210> 12 
<211> 52 
<212> DNA 

<213> Artificial Sequence 



35 



<220> 

<223> Primer 
<400> 12 

ccagtgagca gagtgacgag gactcgagct caagcttttt tttttttttt tt 

<210> 13 
<211> 23 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Primer 

<400> 13 
aagcccggtg cctgactagc tag 

<210> 14 
<211> 22 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Primer 

<400> 14 
gaatatgtct ccaggtccag ag 

<210> 15 
<211> 23 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Primer 

<400> 15 
ccagtgagca gagtgacgag gac 

<210> 16 
<211> 18 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Primer 

<400> 16 
ctagctaggg agctcgtc 
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<210> 17 
<211> 23 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Primer 

<400> 17 
ccagagtctt cagagatcaa gtc 

<210> 18 
<211> 18 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Primer 

<400> 18 
gaggactcga gctcaagc 

<210> 19 
<211> 20 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Primer 

<400> 19 
ctgtaaaacg acggccagtc 
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